Execution and Retrieval of Outputs
Explanation
Source:vignettes/pointers/execution-methods.Rmd
execution-methods.RmdRationale
statim separates declaring a statistical
analysis from running it. Everything up to (and including)
via() builds a lazy specification. conclude()
is the single point where execution actually happens. This separation is
what makes recalibration, inspection, and multi-model comparisons
composable.
This article traces the internal flow from
define_model() through to the retrieval functions
(conclude(), tidy(), display()),
describing what each stage produces and why.
| Stage | Function | Produces | Executes? |
|---|---|---|---|
| Stage 1: layout define | define_model() |
<def_var> |
No |
| Stage 2: parameterization | prepare_test() |
<test_lazy> |
No |
| Stage 2: parameterization | prepare_model() |
<model_lazy> |
No |
| Stage 2: parameterization | prepare() |
<test_lazy> / <model_lazy> |
No |
| Stage 2: parameterization | via() |
modified lazy object | No |
| Stage 3: output process | conclude() |
cld_exec |
Yes |
| Stage 3: output process | tidy() |
tibble | (reads cld_exec@data) |
| Stage 3: output process | print() |
side effect | (reads cld_exec@data) |
Stage 1: Defining the Layout
define_model() is an S7 generic dispatched on its first
argument. When called with a data frame in pipe position, it dispatches
on S7::class_data.frame; when called with a model ID or
formula first, it dispatches on the union of those classes.
define_model() produces a <def_var> S7
object.
Either way, it calls model_processor() on the model ID
and the data, then stores the result in a <def_var>
S7 object:
Nothing statistical has been computed. @processed is a
named list of resolved data structures (data frames, scalars) that the
implementation fn will later receive as .proc.
No fit, no test statistic, no p-value.
sleep |> define_model(x_by(extra, group))
#>
#> -- Model Definition ------------------------------------------------------------
#>
#> Variable Mapper : x_by
#> Args : extra | group
#> Other info:
#> x_vars : 1
#> by_vars : 1
#> Variables :
#> extra : <dbl [20]>
#> group : <fct [20]>Stage 2: Parameterization
Three functions to attach a specification:
prepare_test(), prepare_model(), and
prepare(). These three functions attach an inference
specification to the <def_var>. They differ only in
which spec class they produce:
| Preparation function | Spec class | Lazy object class |
|---|---|---|
prepare_test(.test) |
test_spec |
test_lazy |
prepare_model(.model_fn) |
model_spec |
model_lazy |
prepare(.fn) |
Either | Either` |
Internally, these functions call the stat function with
.var_id = NULL. This special sentinel causes the function
to return its test_spec or model_spec rather
than running any computation, which is how prepare_test(),
prepare_model(), and prepare() harvest the
lookup table of stat_define objects and the function’s
name.
The resulting lazy object carries three things:
-
@model_id: the original model ID. -
@processed: the resolved data structures frommodel_processor(). -
@model_spec/@test_spec: the spec, including thedefslookup table.
Still nothing has been computed:
sleep |>
define_model(x_by(extra, group)) |>
prepare_test(TTEST)
#>
#> -- Model Definition ------------------------------------------------------------
#>
#> Variable Mapper : x_by
#> Args : extra | group
#>
#> -- Test Specification ----------------------------------------------------------
#>
#> Test : T-Test
#> Method : default
mtcars |>
define_model(mpg ~ wt) |>
prepare_model(LINEAR_REG)
#>
#> mpg ~ wt
#>
#> -- Model Specification ---------------------------------------------------------
#>
#> Model : Linear Regression
#> Method : default
via() recalibrates without executing
via() is an S7 generic dispatched on the lazy object
class and a character string. It validates that the named variant exists
for the current model type, then writes the variant name and any
additional arguments into @recalibrate_spec:
The lazy object is returned modified; no computation has occurred.
Calling via() a second time overwrites
@recalibrate_spec in place.
sleep |>
define_model(x_by(extra, group)) |>
prepare_test(TTEST) |>
via("permute", n = 999L)
#>
#> -- Model Definition ------------------------------------------------------------
#>
#> Variable Mapper : x_by
#> Args : extra | group
#>
#> -- Test Specification ----------------------------------------------------------
#>
#> Test : T-Test
#> Method : permute
#> Args : n = 999Stage 3: Execution and output processing
conclude()
conclude() is the terminal step. It is an S7 generic
with separate methods for test_lazy and
model_lazy; both methods follow the same four-step
sequence.
Find the matching
stat_define, which usesfind_def()to look up thedefslist from the spec and matches onS7::S7_class(model_id)@name(or"formula"when the model ID is an R formula). This selects the rightstat_definefor the current model shape.Resolving the variant implementation. If
@recalibrate_specis non-NULL, the variant name is looked up first indef@impl$variants(built-in variants), then in the session-scopedvariant_registry(variants added viaadd_variant()). If both are empty and novia()was called,def@impl$baseis used.Merge arguments. The base arguments from the spec are merged with any arguments supplied to
via(), with thevia()arguments taking precedence:
all_args = utils::modifyList(spec@args, recalibrate_spec$args %||% list())On the test side, if a state_null() claim was present,
the claim_translator on the matching
stat_define is also consulted here, turning the parsed
hypothesis into named arguments injected alongside
.proc.
-
Run the implementation.
inject_and_run()resolves each formal offnin order:.procis always the processed model output; every other formal is taken fromall_argsif present, or from the formal’s declared default otherwise. A formal with neither a supplied value nor a default is a hard error.
The return value of fn is wrapped in a
cld_exec S7 object:
cld_exec(
data = <raw return value of fn>,
impl_cls = <string identifying the stat and model shape>,
cld_meta = list(
model_id = ...,
processed = ...,
stat_name = ...,
method = <variant name, or "default">,
data_name = ...
)
)Examples
sleep |>
define_model(x_by(extra, group)) |>
prepare_test(TTEST) |>
conclude()#>
#> == Model =======================================================================
#>
#> Variable Mapper : x_by
#> Args : extra | group
#> x_vars : 1
#> by_vars : 1
#>
#> == T-Test ======================================================================
#>
#> -- Summary ---------------------------------------------------------------------
#>
#> ──────────────────────────────────────────
#> group estimate t_stat df p_val
#> ──────────────────────────────────────────
#> group -1.580 -1.861 17.780 0.079
#> ──────────────────────────────────────────
#>
#>
#> -- Confidence Interval ---------------------------------------------------------
#>
#> ─────────────────────────────
#> group lower_95 upper_95
#> ─────────────────────────────
#> group -3.365 0.206
#> ─────────────────────────────
mtcars |>
define_model(mpg ~ wt) |>
prepare_model(LINEAR_REG) |>
conclude()#>
#> == Model =======================================================================
#>
#> Variable Mapper : formula
#> Args : mpg ~ wt
#> left_var : 1
#> right_var : 1
#>
#> == Linear Regression ===========================================================
#>
#> -- Coefficients ----------------------------------------------------------------
#>
#> ──────────────┬───────────────────────────────────────────
#> term │ estimate std_error statistic p_value
#> ──────────────┼───────────────────────────────────────────
#> (Intercept) │ 37.285 1.878 19.858 <0.001
#> wt │ -5.344 0.559 -9.559 <0.001
#> ──────────────┴───────────────────────────────────────────
#>
#>
#> -- Model Fit -------------------------------------------------------------------
#> Warning in system("tput cols", intern = TRUE): running command 'tput cols' had
#> status 2
#> ------------------------------------------------------
#> R Squared : 0.75 F-statistic : 91.38
#> Adj. R Squared : 0.74 df1 : 1
#> Sigma : 3.05 df2 : 30
#> n : 32 p-value : <0.001
#> df (residual) : 30 :
#> ------------------------------------------------------
With a via() recalibration, the output reflects the
chosen variant:
sleep |>
define_model(x_by(extra, group)) |>
prepare_test(TTEST) |>
via("permute", n = 999L) |>
conclude()#>
#> == Model =======================================================================
#>
#> Variable Mapper : x_by
#> Args : extra | group
#> x_vars : 1
#> by_vars : 1
#>
#> == T-Test · permute ============================================================
#>
#> ============================== T-test Permutation ==============================
#>
#>
#> -- Summary ---------------------------------------------------------------------
#>
#> ───────────────────────────────
#> Statistic p-value n_perms
#> ───────────────────────────────
#> -1.580 0.102 999
#> ───────────────────────────────
Retrieval of outputs
tidy()
tidy() dispatches on cld_exec. It tries two
paths in order:
auto_tidy()(optional but preferred). Whencld_exec@datais a<class_stat_infer>subclass,auto_tidy()is called on it directly. S7’s method dispatch handles variant-specific overrides: a variant that returns the same class asbaseinherits theauto_tidy()method for free.making_tidy()registry (escape hatch). When@datais not a<class_stat_infer>subclass (a plain list, an S3 object, etc.),tidy()consults the registry populated bymaking_tidy()andmethod_tidy(). This path is only needed for variants that intentionally return a non-standard structure.
In both cases the return value must be a tibble (in
tbl_df S3 class). An informative error is raised if no
method is found.
mtcars |>
define_model(mpg ~ wt) |>
prepare_model(LINEAR_REG) |>
conclude() |>
tidy()#> # A tibble: 2 × 5
#> term estimate std_error statistic p_value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 37.3 1.88 19.9 8.24e-19
#> 2 wt -5.34 0.559 -9.56 1.29e-10
display()
display() on a <multi_exec> prints up
to n individual model outputs from a
write_models() pipeline:
LifeCycleSavings |>
write_models(
f1 = sr ~ 1,
f2 = sr ~ pop15,
f3 = sr ~ pop15 + pop75
) |>
prepare_model(LINEAR_REG) |>
conclude() |>
display(2)#>
#> 1. f1
#>
#> == Model =======================================================================
#>
#> Variable Mapper : formula
#> Args : sr ~ 1
#> left_var : 1
#> right_var : 0
#>
#> == Linear Regression ===========================================================
#>
#> -- Coefficients ----------------------------------------------------------------
#>
#> ──────────────┬───────────────────────────────────────────
#> term │ estimate std_error statistic p_value
#> ──────────────┼───────────────────────────────────────────
#> (Intercept) │ 9.671 0.634 15.263 <0.001
#> ──────────────┴───────────────────────────────────────────
#>
#>
#> -- Model Fit -------------------------------------------------------------------
#> Warning in system("tput cols", intern = TRUE): running command 'tput cols' had
#> status 2
#> -----------------------------------------------------
#> R Squared : 0.00 F-statistic : NaN
#> Adj. R Squared : 0.00 df1 : 0
#> Sigma : 4.48 df2 : 49
#> n : 50 p-value : NaN
#> df (residual) : 49 :
#> -----------------------------------------------------
#>
#>
#>
#> 2. f2
#>
#> == Model =======================================================================
#>
#> Variable Mapper : formula
#> Args : sr ~ pop15
#> left_var : 1
#> right_var : 1
#>
#> == Linear Regression ===========================================================
#>
#> -- Coefficients ----------------------------------------------------------------
#>
#> ──────────────┬───────────────────────────────────────────
#> term │ estimate std_error statistic p_value
#> ──────────────┼───────────────────────────────────────────
#> (Intercept) │ 17.497 2.280 7.675 <0.001
#> pop15 │ -0.223 0.063 -3.545 <0.001
#> ──────────────┴───────────────────────────────────────────
#>
#>
#> -- Model Fit -------------------------------------------------------------------
#> Warning in system("tput cols", intern = TRUE): running command 'tput cols' had
#> status 2
#> ------------------------------------------------------
#> R Squared : 0.21 F-statistic : 12.57
#> Adj. R Squared : 0.19 df1 : 1
#> Sigma : 4.03 df2 : 48
#> n : 50 p-value : <0.001
#> df (residual) : 48 :
#> ------------------------------------------------------
anova()
anova() is a separate generic that operates on model
outputs rather than on cld_exec directly. It dispatches on
<model_lazy>, <cld_exec>,
<multi_lazy>, and <anova_lazy>,
and always returns a <cld_anova>. See the ANOVA for Linear Models article for
the full walkthrough.
Eager path vs lazy path
There is also an eager path: calling
TTEST(x_by(extra, group), sleep) or
LINEAR_REG(mpg ~ wt, mtcars) directly skips
define_model(), prepare_*(), and
via() entirely. Internally, run_stat() calls
find_def() and inject_and_run() directly
against base; only the base implementation is reachable
this way. Variants registered via add_variant() or selected
via via() are not accessible on the eager path.
The eager path returns a cld_exec with the same slot
structure as the lazy path, but the class hierarchy it belongs to is
identical. This means tidy() and print() work
identically on both outputs.
The <class_stat_infer> contract
fn can return anything, but returning a
<class_stat_infer> subclass unlocks automatic
dispatch for both print() and tidy(). The
current hierarchy is:
class_stat_infer
├── anova_able
│ └── class_lm_object (LINEAR_REG)
│ └── class_glm_object (GLM)
├── class_ttest_two (TTEST · x_by)
├── class_ttest_pairwise (TTEST · pairwise)
├── class_corr_two (CORTEST · rel)
└── class_p_test (P_TEST)
A variant that reuses its def’s existing result class inherits both
auto_tidy() and print() automatically. A
variant that needs a genuinely different output shape can opt out by
returning a plain structure and supplying a print function
directly to variant().