Skip to contents

Rationale

statim separates declaring a statistical analysis from running it. Everything up to (and including) via() builds a lazy specification. conclude() is the single point where execution actually happens. This separation is what makes recalibration, inspection, and multi-model comparisons composable.

This article traces the internal flow from define_model() through to the retrieval functions (conclude(), tidy(), display()), describing what each stage produces and why.

Stage Function Produces Executes?
Stage 1: layout define define_model() <def_var> No
Stage 2: parameterization prepare_test() <test_lazy> No
Stage 2: parameterization prepare_model() <model_lazy> No
Stage 2: parameterization prepare() <test_lazy> / <model_lazy> No
Stage 2: parameterization via() modified lazy object No
Stage 3: output process conclude() cld_exec Yes
Stage 3: output process tidy() tibble (reads cld_exec@data)
Stage 3: output process print() side effect (reads cld_exec@data)

Stage 1: Defining the Layout

define_model() is an S7 generic dispatched on its first argument. When called with a data frame in pipe position, it dispatches on S7::class_data.frame; when called with a model ID or formula first, it dispatches on the union of those classes. define_model() produces a <def_var> S7 object.

Either way, it calls model_processor() on the model ID and the data, then stores the result in a <def_var> S7 object:

def_var(
    model_id = <the model ID or formula>,
    processed = <list returned by model_processor()>
)

Nothing statistical has been computed. @processed is a named list of resolved data structures (data frames, scalars) that the implementation fn will later receive as .proc. No fit, no test statistic, no p-value.

sleep |> define_model(x_by(extra, group))
#> 
#> -- Model Definition ------------------------------------------------------------ 
#> 
#> Variable Mapper : x_by 
#> Args : extra | group 
#> Other info:
#>     x_vars : 1 
#>     by_vars : 1 
#> Variables :
#>     extra : <dbl [20]> 
#>     group : <fct [20]>

Stage 2: Parameterization

Three functions to attach a specification: prepare_test(), prepare_model(), and prepare(). These three functions attach an inference specification to the <def_var>. They differ only in which spec class they produce:

Preparation function Spec class Lazy object class
prepare_test(.test) test_spec test_lazy
prepare_model(.model_fn) model_spec model_lazy
prepare(.fn) Either Either`

Internally, these functions call the stat function with .var_id = NULL. This special sentinel causes the function to return its test_spec or model_spec rather than running any computation, which is how prepare_test(), prepare_model(), and prepare() harvest the lookup table of stat_define objects and the function’s name.

The resulting lazy object carries three things:

  • @model_id: the original model ID.
  • @processed: the resolved data structures from model_processor().
  • @model_spec / @test_spec: the spec, including the defs lookup table.

Still nothing has been computed:

sleep |>
    define_model(x_by(extra, group)) |>
    prepare_test(TTEST)
#> 
#> -- Model Definition ------------------------------------------------------------ 
#> 
#> Variable Mapper : x_by 
#> Args : extra | group 
#> 
#> -- Test Specification ---------------------------------------------------------- 
#> 
#> Test   : T-Test 
#> Method : default
mtcars |>
    define_model(mpg ~ wt) |>
    prepare_model(LINEAR_REG)
#> 
#> mpg ~ wt
#> 
#> -- Model Specification --------------------------------------------------------- 
#> 
#> Model  : Linear Regression 
#> Method : default

via() recalibrates without executing

via() is an S7 generic dispatched on the lazy object class and a character string. It validates that the named variant exists for the current model type, then writes the variant name and any additional arguments into @recalibrate_spec:

.x@recalibrate_spec = list(method_name = .method, args = list(...))

The lazy object is returned modified; no computation has occurred. Calling via() a second time overwrites @recalibrate_spec in place.

sleep |>
    define_model(x_by(extra, group)) |>
    prepare_test(TTEST) |>
    via("permute", n = 999L)
#> 
#> -- Model Definition ------------------------------------------------------------ 
#> 
#> Variable Mapper : x_by 
#> Args : extra | group 
#> 
#> -- Test Specification ---------------------------------------------------------- 
#> 
#> Test   : T-Test 
#> Method : permute 
#> Args   : n = 999

Stage 3: Execution and output processing

conclude()

conclude() is the terminal step. It is an S7 generic with separate methods for test_lazy and model_lazy; both methods follow the same four-step sequence.

  1. Find the matching stat_define, which uses find_def() to look up the defs list from the spec and matches on S7::S7_class(model_id)@name (or "formula" when the model ID is an R formula). This selects the right stat_define for the current model shape.

  2. Resolving the variant implementation. If @recalibrate_spec is non-NULL, the variant name is looked up first in def@impl$variants (built-in variants), then in the session-scoped variant_registry (variants added via add_variant()). If both are empty and no via() was called, def@impl$base is used.

  3. Merge arguments. The base arguments from the spec are merged with any arguments supplied to via(), with the via() arguments taking precedence:

all_args = utils::modifyList(spec@args, recalibrate_spec$args %||% list())

On the test side, if a state_null() claim was present, the claim_translator on the matching stat_define is also consulted here, turning the parsed hypothesis into named arguments injected alongside .proc.

  1. Run the implementation. inject_and_run() resolves each formal of fn in order: .proc is always the processed model output; every other formal is taken from all_args if present, or from the formal’s declared default otherwise. A formal with neither a supplied value nor a default is a hard error.

The return value of fn is wrapped in a cld_exec S7 object:

cld_exec(
    data = <raw return value of fn>,
    impl_cls = <string identifying the stat and model shape>,
    cld_meta = list(
        model_id = ...,
        processed = ...,
        stat_name = ...,
        method = <variant name, or "default">,
        data_name = ...
    )
)

Examples

sleep |>
    define_model(x_by(extra, group)) |>
    prepare_test(TTEST) |>
    conclude()
#> 
#> == Model ======================================================================= 
#> 
#> Variable Mapper : x_by 
#> Args : extra | group 
#>     x_vars : 1 
#>     by_vars : 1 
#> 
#> == T-Test ====================================================================== 
#> 
#> -- Summary ---------------------------------------------------------------------
#> 
#> ──────────────────────────────────────────
#>   group  estimate  t_stat    df    p_val  
#> ──────────────────────────────────────────
#>   group   -1.580   -1.861  17.780  0.079  
#> ──────────────────────────────────────────
#> 
#> 
#> -- Confidence Interval ---------------------------------------------------------
#> 
#> ─────────────────────────────
#>   group  lower_95  upper_95  
#> ─────────────────────────────
#>   group   -3.365    0.206    
#> ─────────────────────────────
mtcars |>
    define_model(mpg ~ wt) |>
    prepare_model(LINEAR_REG) |>
    conclude()
#> 
#> == Model ======================================================================= 
#> 
#> Variable Mapper : formula 
#> Args : mpg ~ wt 
#>     left_var : 1 
#>     right_var : 1 
#> 
#> == Linear Regression =========================================================== 
#> 
#> -- Coefficients ----------------------------------------------------------------
#> 
#> ──────────────┬───────────────────────────────────────────
#>   term        │  estimate  std_error  statistic  p_value  
#> ──────────────┼───────────────────────────────────────────
#>   (Intercept) │   37.285     1.878     19.858    <0.001   
#>   wt          │   -5.344     0.559     -9.559    <0.001   
#> ──────────────┴───────────────────────────────────────────
#> 
#> 
#> -- Model Fit -------------------------------------------------------------------
#> Warning in system("tput cols", intern = TRUE): running command 'tput cols' had
#> status 2
#> ------------------------------------------------------
#>   R Squared      :    0.75    F-statistic :    91.38
#>   Adj. R Squared :    0.74    df1         :        1
#>   Sigma          :    3.05    df2         :       30
#>   n              :      32    p-value     :   <0.001
#>   df (residual)  :      30                :         
#> ------------------------------------------------------

With a via() recalibration, the output reflects the chosen variant:

sleep |>
    define_model(x_by(extra, group)) |>
    prepare_test(TTEST) |>
    via("permute", n = 999L) |>
    conclude()
#> 
#> == Model ======================================================================= 
#> 
#> Variable Mapper : x_by 
#> Args : extra | group 
#>     x_vars : 1 
#>     by_vars : 1 
#> 
#> == T-Test · permute ============================================================ 
#> 
#> ============================== T-test Permutation ==============================
#> 
#> 
#> -- Summary ---------------------------------------------------------------------
#> 
#> ───────────────────────────────
#>   Statistic  p-value  n_perms  
#> ───────────────────────────────
#>    -1.580     0.102     999    
#> ───────────────────────────────

Retrieval of outputs

tidy()

tidy() dispatches on cld_exec. It tries two paths in order:

  1. auto_tidy() (optional but preferred). When cld_exec@data is a <class_stat_infer> subclass, auto_tidy() is called on it directly. S7’s method dispatch handles variant-specific overrides: a variant that returns the same class as base inherits the auto_tidy() method for free.

  2. making_tidy() registry (escape hatch). When @data is not a <class_stat_infer> subclass (a plain list, an S3 object, etc.), tidy() consults the registry populated by making_tidy() and method_tidy(). This path is only needed for variants that intentionally return a non-standard structure.

In both cases the return value must be a tibble (in tbl_df S3 class). An informative error is raised if no method is found.

mtcars |>
    define_model(mpg ~ wt) |>
    prepare_model(LINEAR_REG) |>
    conclude() |>
    tidy()
#> # A tibble: 2 × 5
#>   term        estimate std_error statistic  p_value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)    37.3      1.88      19.9  8.24e-19
#> 2 wt             -5.34     0.559     -9.56 1.29e-10

display()

display() on a <multi_exec> prints up to n individual model outputs from a write_models() pipeline:

LifeCycleSavings |>
    write_models(
        f1 = sr ~ 1,
        f2 = sr ~ pop15,
        f3 = sr ~ pop15 + pop75
    ) |>
    prepare_model(LINEAR_REG) |>
    conclude() |>
    display(2)
#> 
#> 1. f1
#> 
#> == Model ======================================================================= 
#> 
#> Variable Mapper : formula 
#> Args : sr ~ 1 
#>     left_var : 1 
#>     right_var : 0 
#> 
#> == Linear Regression =========================================================== 
#> 
#> -- Coefficients ----------------------------------------------------------------
#> 
#> ──────────────┬───────────────────────────────────────────
#>   term        │  estimate  std_error  statistic  p_value  
#> ──────────────┼───────────────────────────────────────────
#>   (Intercept) │   9.671      0.634     15.263    <0.001   
#> ──────────────┴───────────────────────────────────────────
#> 
#> 
#> -- Model Fit -------------------------------------------------------------------
#> Warning in system("tput cols", intern = TRUE): running command 'tput cols' had
#> status 2
#> -----------------------------------------------------
#>   R Squared      :    0.00    F-statistic :     NaN
#>   Adj. R Squared :    0.00    df1         :       0
#>   Sigma          :    4.48    df2         :      49
#>   n              :      50    p-value     :     NaN
#>   df (residual)  :      49                :        
#> -----------------------------------------------------
#> 
#> 
#> 
#> 2. f2
#> 
#> == Model ======================================================================= 
#> 
#> Variable Mapper : formula 
#> Args : sr ~ pop15 
#>     left_var : 1 
#>     right_var : 1 
#> 
#> == Linear Regression =========================================================== 
#> 
#> -- Coefficients ----------------------------------------------------------------
#> 
#> ──────────────┬───────────────────────────────────────────
#>   term        │  estimate  std_error  statistic  p_value  
#> ──────────────┼───────────────────────────────────────────
#>   (Intercept) │   17.497     2.280      7.675    <0.001   
#>   pop15       │   -0.223     0.063     -3.545    <0.001   
#> ──────────────┴───────────────────────────────────────────
#> 
#> 
#> -- Model Fit -------------------------------------------------------------------
#> Warning in system("tput cols", intern = TRUE): running command 'tput cols' had
#> status 2
#> ------------------------------------------------------
#>   R Squared      :    0.21    F-statistic :    12.57
#>   Adj. R Squared :    0.19    df1         :        1
#>   Sigma          :    4.03    df2         :       48
#>   n              :      50    p-value     :   <0.001
#>   df (residual)  :      48                :         
#> ------------------------------------------------------

anova()

anova() is a separate generic that operates on model outputs rather than on cld_exec directly. It dispatches on <model_lazy>, <cld_exec>, <multi_lazy>, and <anova_lazy>, and always returns a <cld_anova>. See the ANOVA for Linear Models article for the full walkthrough.

Eager path vs lazy path

There is also an eager path: calling TTEST(x_by(extra, group), sleep) or LINEAR_REG(mpg ~ wt, mtcars) directly skips define_model(), prepare_*(), and via() entirely. Internally, run_stat() calls find_def() and inject_and_run() directly against base; only the base implementation is reachable this way. Variants registered via add_variant() or selected via via() are not accessible on the eager path.

The eager path returns a cld_exec with the same slot structure as the lazy path, but the class hierarchy it belongs to is identical. This means tidy() and print() work identically on both outputs.

The <class_stat_infer> contract

fn can return anything, but returning a <class_stat_infer> subclass unlocks automatic dispatch for both print() and tidy(). The current hierarchy is:

class_stat_infer
    ├── anova_able
    │       └── class_lm_object       (LINEAR_REG)
    │       └── class_glm_object      (GLM)
    ├── class_ttest_two               (TTEST · x_by)
    ├── class_ttest_pairwise          (TTEST · pairwise)
    ├── class_corr_two                (CORTEST · rel)
    └── class_p_test                  (P_TEST)

A variant that reuses its def’s existing result class inherits both auto_tidy() and print() automatically. A variant that needs a genuinely different output shape can opt out by returning a plain structure and supplying a print function directly to variant().