Smart constructors: basic types → containers.
Builder functions validate inputs and construct frozen container instances. Each builder accepts primitive types and returns a frozen attrs container.
Functions:
| Name | Description |
|---|---|
append_inference_columns | Append standard inference columns to a DataFrame if available. |
build_cv_state | Build a CVState from cross-validation computation. |
build_effects_dataframe | Build the .effects DataFrame from marginal effects state. |
build_fit_state | Build a FitState instance with validation. |
build_inference_state | Build an InferenceState from computed inference values. |
build_joint_test_dataframe | Build an ANOVA-style DataFrame from joint test results. |
build_joint_test_state | Build a JointTestState from computed joint test values. |
build_mee_resamples | Build ResamplesState from MEE inference if samples are available. |
build_mee_state | Build a MeeState from marginal effects computation. |
build_model_spec | Build a ModelSpec from raw inputs. |
build_model_spec_from_formula | Build ModelSpec from a pre-parsed formula structure and resolve defaults. |
build_params_dataframe | Build the .params DataFrame from fit state. |
build_params_resamples | Build ResamplesState from params inference if samples are available. |
build_prediction_state | Build a PredictionState from prediction computation. |
build_predictions_dataframe | Build the .predictions DataFrame from prediction state. |
build_resamples_dataframe | Build a long-format DataFrame of raw resampled values. |
build_resamples_state | Build a ResamplesState from resampling results. |
build_simulation_inference_state | Build a SimulationInferenceState from computed values. |
build_simulation_spec | Build a SimulationSpec for data generation. |
build_simulation_spec_from_formula | Build SimulationSpec from formula with defaults for unspecified variables. |
build_simulations_dataframe | Build the .simulations DataFrame with optional inference columns. |
build_varying_corr_dataframe | Build the .varying_corr DataFrame from random effect correlations. |
build_varying_offsets_dataframe | Build the .varying_offsets DataFrame from varying state. |
build_varying_params_dataframe | Build the .varying_params DataFrame (population + offsets). |
build_varying_spec | Build a VaryingSpec for random effect structure. |
build_varying_spread_dataframe | Build the .varying_spread DataFrame from variance components. |
build_varying_spread_state | Build a VaryingSpreadState from variance component estimates. |
build_varying_state | Build a VaryingState from computed BLUPs. |
extract_mee_names | Extract human-readable names from a MeeState. |
get_varying_random_terms | Get all random terms (Intercept + slope terms) for a VaryingSpec. |
Modules:
| Name | Description |
|---|---|
dataframes | DataFrame builders for user-facing property accessors. |
resamples | Builder functions for resamples-related containers. |
results | Result DataFrame assembly utilities. |
specs | Builder functions for specification containers. |
state | Builder functions for computation state containers. |
Functions¶
append_inference_columns¶
append_inference_columns(df: pl.DataFrame, state: object, method: str | None = None) -> pl.DataFrameAppend standard inference columns to a DataFrame if available.
Checks state.has_inference and, when True, adds each inference
column that is not None on state. Columns are added in canonical
order: se, ci_lower, ci_upper, statistic, df, p_value.
When method is "perm", the ci_lower and ci_upper
columns are excluded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | DataFrame | Base DataFrame to augment (not mutated). | required |
state | object | Object with a has_inference bool and optional se, statistic, df, p_value, ci_lower, ci_upper array attributes. | required |
method | str | None | Inference method ("asymp", "boot", "perm", or None). Controls which columns are included. | None |
Returns:
| Type | Description |
|---|---|
DataFrame | A new DataFrame with inference columns appended (or the |
DataFrame | original DataFrame unchanged if inference is not available). |
build_cv_state¶
build_cv_state(k: int, rmse: float, mae: float, r_squared: float, *, deviance: float | None = None, accuracy: float | None = None, sensitivity: float | None = None, specificity: float | None = None, f1: float | None = None, auc: float | None = None, fold_metrics: dict[str, np.ndarray] | None = None, oos_predictions: np.ndarray | None = None, oos_residuals: np.ndarray | None = None, fold_assignments: np.ndarray | None = None) -> CVStateBuild a CVState from cross-validation computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k | int | Number of folds used. | required |
rmse | float | Root mean squared error. | required |
mae | float | Mean absolute error. | required |
r_squared | float | Coefficient of determination. | required |
deviance | float | None | Mean deviance (GLM only). | None |
accuracy | float | None | Classification accuracy (binomial only). | None |
sensitivity | float | None | Sensitivity / true positive rate (binomial only). | None |
specificity | float | None | Specificity / true negative rate (binomial only). | None |
f1 | float | None | F1 score (binomial only). | None |
auc | float | None | Area under ROC curve (binomial only). | None |
fold_metrics | dict[str, ndarray] | None | Per-fold metrics dictionary. | None |
oos_predictions | ndarray | None | Out-of-sample predictions. | None |
oos_residuals | ndarray | None | Out-of-sample residuals. | None |
fold_assignments | ndarray | None | Array indicating which fold each observation belongs to. | None |
Returns:
| Type | Description |
|---|---|
CVState | Frozen CVState instance. |
Examples:
>>> state = build_cv_state(
... k=10,
... rmse=0.523,
... mae=0.412,
... r_squared=0.891,
... )build_effects_dataframe¶
build_effects_dataframe(mee: MeeState, method: str | None = None) -> pl.DataFrameBuild the .effects DataFrame from marginal effects state.
Column set varies by inference method: bootstrap excludes p_value,
permutation excludes ci_lower/ci_upper.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mee | MeeState | MeeState with grid, estimates, and optional inference. | required |
method | str | None | Inference method ("asymp", "boot", "perm", or None). Controls which inference columns are included. | None |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with grid columns, estimate, and method-appropriate |
DataFrame | inference columns. |
build_fit_state¶
build_fit_state(*, coef: NDArray[np.floating], vcov: NDArray[np.floating], fitted: NDArray[np.floating], residuals: NDArray[np.floating], leverage: NDArray[np.floating], df_resid: float, loglik: float, converged: bool = True, n_iter: int = 1, sigma: float | None = None, dispersion: float | None = None, null_deviance: float | None = None, deviance: float | None = None, theta: NDArray[np.floating] | None = None, u: NDArray[np.floating] | None = None, irls_weights: NDArray[np.floating] | None = None, XtWX_inv: NDArray[np.floating] | None = None) -> FitStateBuild a FitState instance with validation.
This builder function provides a keyword-only interface for constructing FitState instances, ensuring all required fields are explicitly provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coef | NDArray[floating] | Coefficient estimates (1D array of length p). | required |
vcov | NDArray[floating] | Variance-covariance matrix (p x p array). | required |
fitted | NDArray[floating] | Fitted values (1D array of length n). | required |
residuals | NDArray[floating] | Residuals (1D array of length n). | required |
leverage | NDArray[floating] | Hat matrix diagonal / leverage values (1D array of length n). | required |
df_resid | float | Residual degrees of freedom. | required |
loglik | float | Log-likelihood at convergence. | required |
converged | bool | Whether the optimization converged. | True |
n_iter | int | Number of iterations (1 for closed-form solutions). | 1 |
sigma | float | None | Residual standard deviation (OLS models only). | None |
dispersion | float | None | Dispersion parameter (GLM models only). | None |
null_deviance | float | None | Null model deviance (GLM models only). | None |
deviance | float | None | Residual deviance, sum of unit deviances (GLM models only). | None |
theta | NDArray[floating] | None | Random effect variance parameters (mixed models only). | None |
u | NDArray[floating] | None | Spherical random effects (mixed models only). | None |
irls_weights | NDArray[floating] | None | IRLS weights from GLM fit (GLM sandwich estimator). | None |
XtWX_inv | NDArray[floating] | None | Inverse of X’WX from GLM fit (GLM sandwich estimator). | None |
Returns:
| Type | Description |
|---|---|
FitState | A new FitState instance. |
Examples:
>>> import numpy as np
>>> from state import build_fit_state
>>> state = build_fit_state(
... coef=np.array([1.0, 2.0]),
... vcov=np.eye(2),
... fitted=np.array([1.0, 2.0, 3.0]),
... residuals=np.array([0.1, -0.1, 0.0]),
... leverage=np.array([0.3, 0.3, 0.4]),
... df_resid=1.0,
... loglik=-10.0,
... sigma=0.5,
... )
>>> state.sigma
0.5build_inference_state¶
build_inference_state(se: np.ndarray, statistic: np.ndarray, df: np.ndarray, p_value: np.ndarray, ci_lower: np.ndarray, ci_upper: np.ndarray, *, conf_level: float = 0.95, method: str = 'asymp', null: float = 0.0, alternative: str = 'two-sided', n_resamples: int | None = None, boot_samples: np.ndarray | None = None, perm_samples: np.ndarray | None = None, pre: np.ndarray | None = None, pre_sd: np.ndarray | None = None) -> InferenceStateBuild an InferenceState from computed inference values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
se | ndarray | Standard errors for each coefficient. | required |
statistic | ndarray | Test statistics (t or z). | required |
df | ndarray | Degrees of freedom. | required |
p_value | ndarray | P-values. | required |
ci_lower | ndarray | Lower confidence interval bounds. | required |
ci_upper | ndarray | Upper confidence interval bounds. | required |
conf_level | float | Confidence level (default 0.95). | 0.95 |
method | str | Inference method (“asymp”, “boot”, “perm”, “cv”). | ‘asymp’ |
null | float | Null hypothesis value (default 0.0). | 0.0 |
alternative | str | Alternative hypothesis direction (default “two-sided”). | ‘two-sided’ |
n_resamples | int | None | Number of bootstrap/permutation resamples. | None |
boot_samples | ndarray | None | Raw bootstrap samples. | None |
perm_samples | ndarray | None | Null distribution of test statistics from permutation tests. | None |
pre | ndarray | None | PRE (Proportion Reduction in Error) per coefficient (CV ablation). | None |
pre_sd | ndarray | None | Standard deviation of PRE across CV folds (CV ablation). | None |
Returns:
| Type | Description |
|---|---|
InferenceState | Frozen InferenceState instance. |
Examples:
>>> state = build_inference_state(
... se=np.array([0.1, 0.2]),
... statistic=np.array([5.0, 2.5]),
... df=np.array([98.0, 98.0]),
... p_value=np.array([0.001, 0.014]),
... ci_lower=np.array([0.3, 0.1]),
... ci_upper=np.array([0.7, 0.9]),
... )build_joint_test_dataframe¶
build_joint_test_dataframe(state: JointTestState) -> pl.DataFrameBuild an ANOVA-style DataFrame from joint test results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | JointTestState | JointTestState with terms, df, statistics, and p-values. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with term, df1, optional df2, |
DataFrame | f_ratio or Chisq, and p_value columns. |
build_joint_test_state¶
build_joint_test_state(terms: tuple[str, ...] | list[str], df1: np.ndarray, statistic: np.ndarray, p_value: np.ndarray, *, test_type: str = 'F', ss_type: str = 'III', df2: np.ndarray | None = None) -> JointTestStateBuild a JointTestState from computed joint test values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
terms | tuple[str, ...] | list[str] | Names of terms being tested. | required |
df1 | ndarray | Numerator degrees of freedom per term. | required |
statistic | ndarray | Test statistic values (F or chi2). | required |
p_value | ndarray | P-values for each term. | required |
test_type | str | Type of test (“F” for linear models, “chi2” for GLMs). | ‘F’ |
ss_type | str | Sum of squares type (“II” or “III”). | ‘III’ |
df2 | ndarray | None | Denominator degrees of freedom (required for F-tests). | None |
Returns:
| Type | Description |
|---|---|
JointTestState | Frozen JointTestState instance. |
Examples:
F-test results (linear model)::
>>> state = build_joint_test_state(
... terms=("a", "b", "a:b"),
... df1=np.array([2, 1, 2]),
... df2=np.array([94, 94, 94]),
... statistic=np.array([5.2, 12.1, 0.8]),
... p_value=np.array([0.007, 0.001, 0.45]),
... test_type="F",
... )Chi-square results (GLM)::
>>> state = build_joint_test_state(
... terms=("a", "b"),
... df1=np.array([2, 1]),
... statistic=np.array([8.5, 15.2]),
... p_value=np.array([0.014, 0.0001]),
... test_type="chi2",
... )build_mee_resamples¶
build_mee_resamples(mee: MeeState | None, samples: np.ndarray | None, how: str) -> ResamplesState | NoneBuild ResamplesState from MEE inference if samples are available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mee | MeeState | None | The MeeState from explore, or None. | required |
samples | ndarray | None | Raw resample array from dispatch_mee_inference, or None. | required |
how | str | Inference method used ("boot", "perm", etc.). | required |
Returns:
| Type | Description |
|---|---|
ResamplesState | None | ResamplesState if boot/perm samples were saved, else None. |
build_mee_state¶
build_mee_state(grid: 'pl.DataFrame', estimate: np.ndarray, explore_formula: str, focal_var: str, mee_type: str, *, how: str = 'mem', effect_scale: str = 'link', L_matrix: np.ndarray | None = None, contrast_method: str | None = None, n_contrast_levels: int | None = None, link: str | None = None, L_matrix_link: np.ndarray | None = None, boot_X_plus: np.ndarray | None = None, boot_X_minus: np.ndarray | None = None, boot_delta: float | None = None, se: np.ndarray | None = None, df: np.ndarray | None = None, statistic: np.ndarray | None = None, p_value: np.ndarray | None = None, ci_lower: np.ndarray | None = None, ci_upper: np.ndarray | None = None, conf_level: float | None = None) -> MeeStateBuild a MeeState from marginal effects computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid | ‘pl.DataFrame’ | Polars DataFrame with the evaluation grid. | required |
estimate | ndarray | Point estimates for each grid row. | required |
explore_formula | str | The explore formula string. | required |
focal_var | str | The primary variable being explored. | required |
mee_type | str | Type of effect (“means”, “slopes”, “contrasts”). | required |
how | str | Averaging method: "mem" (Marginal Estimated Mean, balanced reference grid) or "ame" (Average Marginal Effect, g-computation over observed data). | ‘mem’ |
effect_scale | str | Scale of estimates: "link" (linear predictor) or "response" (inverse-link / data scale). | ‘link’ |
L_matrix | ndarray | None | Design matrix for delta method inference (optional). Shape (n_estimates, n_coef). For EMMs this is X_ref. | None |
contrast_method | str | None | Original contrast type for multiplicity adjustment (“pairwise”, “sequential”, “poly”, “treatment”, “sum”, “helmert”, or None). | None |
n_contrast_levels | int | None | Number of EMM levels before contrasting (family size). | None |
link | str | None | Link function name for response-scale CI back-transformation. | None |
L_matrix_link | ndarray | None | Link-scale L_matrix for CI back-transformation. | None |
boot_X_plus | ndarray | None | Per-combo average design matrix at focal_var + delta/2. For exact response-scale bootstrap AME recomputation. | None |
boot_X_minus | ndarray | None | Per-combo average design matrix at focal_var - delta/2. | None |
boot_delta | float | None | Finite-difference step size for bootstrap slope recomputation. | None |
se | ndarray | None | Standard errors (optional, from .infer()). | None |
df | ndarray | None | Degrees of freedom (optional). | None |
statistic | ndarray | None | Test statistics (optional). | None |
p_value | ndarray | None | P-values (optional). | None |
ci_lower | ndarray | None | Lower CI bounds (optional). | None |
ci_upper | ndarray | None | Upper CI bounds (optional). | None |
conf_level | float | None | Confidence level (optional). | None |
Returns:
| Type | Description |
|---|---|
MeeState | Frozen MeeState instance. |
Examples:
>>> import polars as pl
>>> grid = pl.DataFrame({"treatment": ["A", "B", "C"]})
>>> state = build_mee_state(
... grid=grid,
... estimate=np.array([1.0, 2.0, 3.0]),
... explore_formula="treatment",
... focal_var="treatment",
... mee_type="means",
... )
>>> state.has_inference
Falsebuild_model_spec¶
build_model_spec(formula: str, *, family: str = 'gaussian', link: str | None = None, method: str | None = None, response_var: str | None = None, fixed_terms: tuple[str, ...] | list[str] | None = None, random_terms: tuple[str, ...] | list[str] | None = None, has_random_effects: bool | None = None) -> ModelSpecBuild a ModelSpec from raw inputs.
This factory function handles defaults, validation, and inference of missing fields. In a full implementation, formula parsing would extract response_var, fixed_terms, and random_terms automatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | The model formula string. | required |
family | str | Distribution family (default: “gaussian”). | ‘gaussian’ |
link | str | None | Link function. If None, uses canonical link for family. | None |
method | str | None | Estimation method. If None, inferred from family and RE. | None |
response_var | str | None | Response variable name. Required if not parsed. | None |
fixed_terms | tuple[str, ...] | list[str] | None | Fixed effect terms. Required if not parsed. | None |
random_terms | tuple[str, ...] | list[str] | None | Random effect terms (default: empty tuple). | None |
has_random_effects | bool | None | Whether model has RE. Inferred from random_terms. | None |
Returns:
| Type | Description |
|---|---|
ModelSpec | A validated ModelSpec instance. |
Examples:
>>> spec = build_model_spec(
... formula="y ~ x + treatment",
... response_var="y",
... fixed_terms=["Intercept", "x", "treatment"],
... )
>>> spec.method
'ols'build_model_spec_from_formula¶
build_model_spec_from_formula(formula: str, *, family: str = 'gaussian', link: str | None = None, method: str | None = None, structure: FormulaStructure) -> ModelSpecBuild ModelSpec from a pre-parsed formula structure and resolve defaults.
The caller must parse the formula into a FormulaStructure first
(via extract_formula_structure). This keeps containers/ free
of imports from formula/.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | R-style model formula (e.g., ``"y ~ x + (1 | group)"``). |
family | str | Distribution family (default: "gaussian"). | ‘gaussian’ |
link | str | None | Link function. If None, uses canonical link for family. | None |
method | str | None | Estimation method. If None, inferred from family and RE presence. Validated against family/RE constraints if specified. | None |
structure | FormulaStructure | Pre-parsed formula structure from extract_formula_structure(formula). | required |
Returns:
| Type | Description |
|---|---|
ModelSpec | A validated ModelSpec instance. |
Examples:
>>> from parse import extract_formula_structure
>>> s = extract_formula_structure("y ~ x + treatment")
>>> spec = build_model_spec_from_formula("y ~ x + treatment", structure=s)
>>> spec.method
'ols'build_params_dataframe¶
build_params_dataframe(bundle: DataBundle, fit: FitState, params_inference: InferenceState | None) -> pl.DataFrameBuild the .params DataFrame from fit state.
Column set varies by inference method:
asymp: all inference columns (p_value last).
boot: all inference columns; df = n_resamples.
perm: percentile CIs; df = n_valid_resamples.
cv: PRE columns only.
None: term + estimate only.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle | DataBundle | Data bundle containing X_names (coefficient labels). | required |
fit | FitState | Fit state containing coef (coefficient estimates). | required |
params_inference | InferenceState | None | Optional inference state with SE, CI, p-values. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with term, estimate, and method-appropriate |
DataFrame | inference columns. |
build_params_resamples¶
build_params_resamples(inference: InferenceState | None, fit_coef: np.ndarray, x_names: tuple[str, ...], how: str) -> ResamplesState | NoneBuild ResamplesState from params inference if samples are available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inference | InferenceState | None | The InferenceState from params inference, or None. | required |
fit_coef | ndarray | Coefficient estimates from the FitState. | required |
x_names | tuple[str, ...] | Design matrix column names from the DataBundle. | required |
how | str | Inference method used ("boot", "perm", etc.). | required |
Returns:
| Type | Description |
|---|---|
ResamplesState | None | ResamplesState if boot/perm samples were saved, else None. |
build_prediction_state¶
build_prediction_state(fitted: np.ndarray, *, link: np.ndarray | None = None, X_pred: np.ndarray | None = None, config: PredictionConfig | None = None, se: np.ndarray | None = None, ci_lower: np.ndarray | None = None, ci_upper: np.ndarray | None = None, interval_type: str | None = None, conf_level: float | None = None, grid: 'pl.DataFrame | None' = None) -> PredictionStateBuild a PredictionState from prediction computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fitted | ndarray | Predicted values on response scale. | required |
link | ndarray | None | Predicted values on link scale (for GLM/GLMM). | None |
X_pred | ndarray | None | Design matrix used for predictions. Stored so that .infer() can compute delta-method SEs on the correct X. | None |
config | PredictionConfig | None | Prediction configuration for bootstrap replay. | None |
se | ndarray | None | Standard errors of predictions. | None |
ci_lower | ndarray | None | Lower interval bounds. | None |
ci_upper | ndarray | None | Upper interval bounds. | None |
interval_type | str | None | Type of interval (“confidence” or “prediction”). | None |
conf_level | float | None | Confidence level for intervals. | None |
grid | ‘pl.DataFrame | None’ | Grid DataFrame for formula-mode predictions. When present, build_predictions_dataframe() prepends these columns. | None |
Returns:
| Type | Description |
|---|---|
PredictionState | Frozen PredictionState instance. |
Examples:
>>> state = build_prediction_state(
... fitted=np.array([1.0, 2.0, 3.0]),
... )
>>> state.has_inference
False
>>> # With inference
>>> state = build_prediction_state(
... fitted=np.array([1.0, 2.0, 3.0]),
... se=np.array([0.1, 0.1, 0.1]),
... ci_lower=np.array([0.8, 1.8, 2.8]),
... ci_upper=np.array([1.2, 2.2, 3.2]),
... interval_type="confidence",
... conf_level=0.95,
... )
>>> state.has_inference
Truebuild_predictions_dataframe¶
build_predictions_dataframe(pred: PredictionState) -> pl.DataFrameBuild the .predictions DataFrame from prediction state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pred | PredictionState | Prediction state with fitted values and optional link-scale, inference, and CV columns. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with optional grid columns (formula mode), fitted, |
DataFrame | optional link, inference columns, and optional CV columns. |
build_resamples_dataframe¶
build_resamples_dataframe(rs: ResamplesState) -> pl.DataFrameBuild a long-format DataFrame of raw resampled values.
Returns one row per (resample, term) combination with the raw resampled value — coefficient estimates for bootstrap, null t-statistics for permutation.
Columns: resample (int), term (str), value (float).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rs | ResamplesState | Frozen ResamplesState from bootstrap or permutation inference. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | Polars DataFrame with n_resamples × k rows, where k is the |
DataFrame | number of terms/effects. |
Examples:
>>> df = build_resamples_dataframe(rs)
>>> df.columns
['resample', 'term', 'value']
>>> df.shape
(1000, 3) # 100 resamples × 10 termsbuild_resamples_state¶
build_resamples_state(*, samples: NDArray[np.floating], observed: NDArray[np.floating], names: tuple[str, ...] | list[str], method: str, n_resamples: int, context: str) -> ResamplesStateBuild a ResamplesState from resampling results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples | NDArray[floating] | Resampled statistics array, shape (n_resamples, k). | required |
observed | NDArray[floating] | Observed statistics, shape (k,). | required |
names | tuple[str, ...] | list[str] | Term/effect names corresponding to columns of samples. | required |
method | str | Resampling method ("boot" or "perm"). | required |
n_resamples | int | Number of resamples. | required |
context | str | What was resampled ("params" or "effects"). | required |
Returns:
| Type | Description |
|---|---|
ResamplesState | Frozen ResamplesState instance. |
Examples:
>>> state = build_resamples_state(
... samples=np.random.randn(100, 2),
... observed=np.array([1.0, 2.0]),
... names=("Intercept", "x"),
... method="boot",
... n_resamples=100,
... context="params",
... )
>>> state.method
'boot'build_simulation_inference_state¶
build_simulation_inference_state(sim_type: str, n_sims: int, *, sim_mean: np.ndarray | None = None, sim_sd: np.ndarray | None = None, sim_quantiles: dict[str, np.ndarray] | None = None, power: dict[str, float] | None = None, coverage: dict[str, float] | None = None, bias: dict[str, float] | None = None, rmse: dict[str, float] | None = None, alpha: float = 0.05, true_coef: dict[str, float] | None = None) -> SimulationInferenceStateBuild a SimulationInferenceState from computed values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sim_type | str | Type of simulation (“post_fit” or “power_analysis”). | required |
n_sims | int | Number of simulations. | required |
sim_mean | ndarray | None | Mean of simulated values per observation. | None |
sim_sd | ndarray | None | SD of simulated values per observation. | None |
sim_quantiles | dict[str, ndarray] | None | Dict of quantile name -> array mappings. | None |
power | dict[str, float] | None | Dict of term name -> power mappings. | None |
coverage | dict[str, float] | None | Dict of term name -> coverage mappings. | None |
bias | dict[str, float] | None | Dict of term name -> bias mappings. | None |
rmse | dict[str, float] | None | Dict of term name -> RMSE mappings. | None |
alpha | float | Significance level for power calculation. | 0.05 |
true_coef | dict[str, float] | None | True coefficient values for coverage/bias. | None |
Returns:
| Type | Description |
|---|---|
SimulationInferenceState | Frozen SimulationInferenceState instance. |
Examples:
>>> state = build_simulation_inference_state(
... sim_type="post_fit",
... n_sims=100,
... sim_mean=np.array([1.0, 2.0, 3.0]),
... sim_sd=np.array([0.1, 0.2, 0.3]),
... )build_simulation_spec¶
build_simulation_spec(n: int, *, distributions: dict[str, Distribution] | None = None, coef: dict[str, float] | None = None, sigma: float = 1.0, re_spec: dict[str, VaryingSpec] | None = None, seed: int | None = None) -> SimulationSpecBuild a SimulationSpec for data generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n | int | Total number of observations. | required |
distributions | dict[str, Distribution] | None | Variable name -> Distribution mappings. | None |
coef | dict[str, float] | None | Coefficient name -> value mappings. | None |
sigma | float | Residual standard deviation. | 1.0 |
re_spec | dict[str, VaryingSpec] | None | Grouping variable -> VaryingSpec mappings. | None |
seed | int | None | Random seed for reproducibility. | None |
Returns:
| Type | Description |
|---|---|
SimulationSpec | SimulationSpec instance. |
Examples:
>>> spec = build_simulation_spec(n=100)build_simulation_spec_from_formula¶
build_simulation_spec_from_formula(formula: str, n: int, *, distributions: dict[str, Distribution] | None = None, coef: dict[str, float] | None = None, sigma: float = 1.0, seed: int | None = None) -> SimulationSpecBuild SimulationSpec from formula with defaults for unspecified variables.
Parses the formula to identify variable types and creates appropriate default distributions for variables not explicitly specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | Model formula (e.g., "y ~ x + factor(group) + (1 | subject)"). |
n | int | Number of observations to generate. | required |
distributions | dict[str, Distribution] | None | User-provided distributions for specific variables. | None |
coef | dict[str, float] | None | True coefficient values (defaults to all zeros). | None |
sigma | float | Residual standard deviation. | 1.0 |
seed | int | None | Random seed. | None |
Returns:
| Type | Description |
|---|---|
SimulationSpec | SimulationSpec ready for data generation. |
Examples:
>>> spec = build_simulation_spec_from_formula("y ~ x + factor(group)", n=100)
>>> spec.n
100build_simulations_dataframe¶
build_simulations_dataframe(simulations: pl.DataFrame, sim_inference: SimulationInferenceState | None) -> pl.DataFrameBuild the .simulations DataFrame with optional inference columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
simulations | DataFrame | Base simulations DataFrame (generated data or sim columns). | required |
sim_inference | SimulationInferenceState | None | Optional simulation inference state with summary stats. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with simulation data and optional sim_mean, sim_sd, |
DataFrame | and quantile columns. |
build_varying_corr_dataframe¶
build_varying_corr_dataframe(varying_spread: VaryingSpreadState) -> pl.DataFrameBuild the .varying_corr DataFrame from random effect correlations.
Extracts correlation entries from the VaryingSpreadState rho dict
into a tidy DataFrame. Returns an empty DataFrame (with the correct
schema) when no correlations are present (e.g., intercept-only or
diagonal RE structures).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
varying_spread | VaryingSpreadState | Variance component state containing rho (dict mapping "effect1:effect2" keys to correlation values). | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with columns group, effect1, effect2, corr. |
build_varying_offsets_dataframe¶
build_varying_offsets_dataframe(varying_offsets: VaryingState) -> pl.DataFrameBuild the .varying_offsets DataFrame from varying state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
varying_offsets | VaryingState | Varying state with grid, effects, and optional PIs. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with group, level, effect columns, and optional |
DataFrame | prediction interval columns. |
build_varying_params_dataframe¶
build_varying_params_dataframe(bundle: DataBundle, fit: FitState, varying_offsets: VaryingState) -> pl.DataFrameBuild the .varying_params DataFrame (population + offsets).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle | DataBundle | Data bundle containing X_names for population param lookup. | required |
fit | FitState | Fit state containing coef (population coefficients). | required |
varying_offsets | VaryingState | Varying state with grid and per-group effects. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with group, level, and effect columns where each |
DataFrame | value is population_param + BLUP. |
build_varying_spec¶
build_varying_spec(n: int, sd: float = 1.0, *, slope_sds: dict[str, float] | None = None, correlations: dict[tuple[str, str], float] | None = None, n_per: int | None = None) -> VaryingSpecBuild a VaryingSpec for random effect structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n | int | Number of groups. | required |
sd | float | Standard deviation for random intercept. | 1.0 |
slope_sds | dict[str, float] | None | Dictionary of term -> slope SD mappings. | None |
correlations | dict[tuple[str, str], float] | None | Dictionary of (term1, term2) -> correlation mappings. | None |
n_per | int | None | Number of units per group for nested effects. | None |
Returns:
| Type | Description |
|---|---|
VaryingSpec | VaryingSpec instance. |
Examples:
>>> spec = build_varying_spec(n=50, sd=0.3)build_varying_spread_dataframe¶
build_varying_spread_dataframe(varying_spread: VaryingSpreadState) -> pl.DataFrameBuild the .varying_spread DataFrame from variance components.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
varying_spread | VaryingSpreadState | Variance component state with components DataFrame and optional CI information. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with component, estimate, and optional |
DataFrame | ci_lower, ci_upper, ci_method columns. |
build_varying_spread_state¶
build_varying_spread_state(components: 'pl.DataFrame', sigma2: float, tau2: dict[str, float], *, rho: dict[str, float] | None = None, icc: float | None = None, ci_lower: dict[str, float] | None = None, ci_upper: dict[str, float] | None = None, conf_level: float | None = None, ci_method: str | None = None) -> VaryingSpreadStateBuild a VaryingSpreadState from variance component estimates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
components | ‘pl.DataFrame’ | Polars DataFrame with component estimates. | required |
sigma2 | float | Residual variance. | required |
tau2 | dict[str, float] | Dict mapping effect names to variance estimates. | required |
rho | dict[str, float] | None | Dict mapping effect pairs to correlations (optional). | None |
icc | float | None | Intraclass correlation coefficient (optional). | None |
ci_lower | dict[str, float] | None | Lower CI bounds (optional, from .infer()). | None |
ci_upper | dict[str, float] | None | Upper CI bounds (optional, from .infer()). | None |
conf_level | float | None | Confidence level (optional). | None |
ci_method | str | None | CI method used (optional). | None |
Returns:
| Type | Description |
|---|---|
VaryingSpreadState | Frozen VaryingSpreadState instance. |
Examples:
>>> import polars as pl
>>> components = pl.DataFrame({
... "component": ["sigma2", "tau2_Intercept", "icc"],
... "estimate": [1.0, 0.5, 0.33],
... })
>>> state = build_varying_spread_state(
... components=components,
... sigma2=1.0,
... tau2={"Intercept": 0.5},
... icc=0.33,
... )build_varying_state¶
build_varying_state(grid: 'pl.DataFrame', effects: dict[str, np.ndarray], grouping_var: str, n_groups: int, *, pi_lower: dict[str, np.ndarray] | None = None, pi_upper: dict[str, np.ndarray] | None = None, conf_level: float | None = None) -> VaryingStateBuild a VaryingState from computed BLUPs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid | ‘pl.DataFrame’ | Polars DataFrame with group identifiers. | required |
effects | dict[str, ndarray] | Dict mapping effect names to BLUP arrays. | required |
grouping_var | str | Name of the grouping variable. | required |
n_groups | int | Number of groups. | required |
pi_lower | dict[str, ndarray] | None | Lower prediction interval bounds (optional). | None |
pi_upper | dict[str, ndarray] | None | Upper prediction interval bounds (optional). | None |
conf_level | float | None | Confidence level for intervals (optional). | None |
Returns:
| Type | Description |
|---|---|
VaryingState | Frozen VaryingState instance. |
Examples:
>>> state = build_varying_state(
... grid=pl.DataFrame({"subject": ["S1", "S2", "S3"]}),
... effects={"Intercept": np.array([0.5, -0.3, 0.1])},
... grouping_var="subject",
... n_groups=3,
... )extract_mee_names¶
extract_mee_names(mee: MeeState) -> tuple[str, ...]Extract human-readable names from a MeeState.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mee | MeeState | The MeeState to extract names from. | required |
Returns:
| Type | Description |
|---|---|
tuple[str, ...] | Tuple of effect names for each estimate. |
get_varying_random_terms¶
get_varying_random_terms(spec: VaryingSpec) -> tuple[str, ...]Get all random terms (Intercept + slope terms) for a VaryingSpec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | VaryingSpec | A VaryingSpec instance. | required |
Returns:
| Type | Description |
|---|---|
tuple[str, ...] | Tuple of random term names, starting with “Intercept”. |
Modules¶
dataframes¶
DataFrame builders for user-facing property accessors.
Pure functions that assemble Polars DataFrames from internal state
containers. Each builder corresponds to a model property (.params,
.effects, etc.) and contains only the DataFrame construction logic
that was previously inlined in core.py property methods.
Functions:
| Name | Description |
|---|---|
build_effects_dataframe | Build the .effects DataFrame from marginal effects state. |
build_joint_test_dataframe | Build an ANOVA-style DataFrame from joint test results. |
build_params_dataframe | Build the .params DataFrame from fit state. |
build_predictions_dataframe | Build the .predictions DataFrame from prediction state. |
build_simulations_dataframe | Build the .simulations DataFrame with optional inference columns. |
build_varying_corr_dataframe | Build the .varying_corr DataFrame from random effect correlations. |
build_varying_offsets_dataframe | Build the .varying_offsets DataFrame from varying state. |
build_varying_params_dataframe | Build the .varying_params DataFrame (population + offsets). |
build_varying_spread_dataframe | Build the .varying_spread DataFrame from variance components. |
Attributes¶
Classes¶
Functions¶
build_effects_dataframe¶
build_effects_dataframe(mee: MeeState, method: str | None = None) -> pl.DataFrameBuild the .effects DataFrame from marginal effects state.
Column set varies by inference method: bootstrap excludes p_value,
permutation excludes ci_lower/ci_upper.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mee | MeeState | MeeState with grid, estimates, and optional inference. | required |
method | str | None | Inference method ("asymp", "boot", "perm", or None). Controls which inference columns are included. | None |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with grid columns, estimate, and method-appropriate |
DataFrame | inference columns. |
build_joint_test_dataframe¶
build_joint_test_dataframe(state: JointTestState) -> pl.DataFrameBuild an ANOVA-style DataFrame from joint test results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | JointTestState | JointTestState with terms, df, statistics, and p-values. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with term, df1, optional df2, |
DataFrame | f_ratio or Chisq, and p_value columns. |
build_params_dataframe¶
build_params_dataframe(bundle: DataBundle, fit: FitState, params_inference: InferenceState | None) -> pl.DataFrameBuild the .params DataFrame from fit state.
Column set varies by inference method:
asymp: all inference columns (p_value last).
boot: all inference columns; df = n_resamples.
perm: percentile CIs; df = n_valid_resamples.
cv: PRE columns only.
None: term + estimate only.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle | DataBundle | Data bundle containing X_names (coefficient labels). | required |
fit | FitState | Fit state containing coef (coefficient estimates). | required |
params_inference | InferenceState | None | Optional inference state with SE, CI, p-values. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with term, estimate, and method-appropriate |
DataFrame | inference columns. |
build_predictions_dataframe¶
build_predictions_dataframe(pred: PredictionState) -> pl.DataFrameBuild the .predictions DataFrame from prediction state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pred | PredictionState | Prediction state with fitted values and optional link-scale, inference, and CV columns. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with optional grid columns (formula mode), fitted, |
DataFrame | optional link, inference columns, and optional CV columns. |
build_simulations_dataframe¶
build_simulations_dataframe(simulations: pl.DataFrame, sim_inference: SimulationInferenceState | None) -> pl.DataFrameBuild the .simulations DataFrame with optional inference columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
simulations | DataFrame | Base simulations DataFrame (generated data or sim columns). | required |
sim_inference | SimulationInferenceState | None | Optional simulation inference state with summary stats. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with simulation data and optional sim_mean, sim_sd, |
DataFrame | and quantile columns. |
build_varying_corr_dataframe¶
build_varying_corr_dataframe(varying_spread: VaryingSpreadState) -> pl.DataFrameBuild the .varying_corr DataFrame from random effect correlations.
Extracts correlation entries from the VaryingSpreadState rho dict
into a tidy DataFrame. Returns an empty DataFrame (with the correct
schema) when no correlations are present (e.g., intercept-only or
diagonal RE structures).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
varying_spread | VaryingSpreadState | Variance component state containing rho (dict mapping "effect1:effect2" keys to correlation values). | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with columns group, effect1, effect2, corr. |
build_varying_offsets_dataframe¶
build_varying_offsets_dataframe(varying_offsets: VaryingState) -> pl.DataFrameBuild the .varying_offsets DataFrame from varying state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
varying_offsets | VaryingState | Varying state with grid, effects, and optional PIs. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with group, level, effect columns, and optional |
DataFrame | prediction interval columns. |
build_varying_params_dataframe¶
build_varying_params_dataframe(bundle: DataBundle, fit: FitState, varying_offsets: VaryingState) -> pl.DataFrameBuild the .varying_params DataFrame (population + offsets).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle | DataBundle | Data bundle containing X_names for population param lookup. | required |
fit | FitState | Fit state containing coef (population coefficients). | required |
varying_offsets | VaryingState | Varying state with grid and per-group effects. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with group, level, and effect columns where each |
DataFrame | value is population_param + BLUP. |
build_varying_spread_dataframe¶
build_varying_spread_dataframe(varying_spread: VaryingSpreadState) -> pl.DataFrameBuild the .varying_spread DataFrame from variance components.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
varying_spread | VaryingSpreadState | Variance component state with components DataFrame and optional CI information. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with component, estimate, and optional |
DataFrame | ci_lower, ci_upper, ci_method columns. |
resamples¶
Builder functions for resamples-related containers.
Provides constructors for ResamplesState and helpers for building resamples from inference results. Moved from state.py to keep modules under the 800-line limit.
Functions:
| Name | Description |
|---|---|
build_mee_resamples | Build ResamplesState from MEE inference if samples are available. |
build_params_resamples | Build ResamplesState from params inference if samples are available. |
build_resamples_dataframe | Build a long-format DataFrame of raw resampled values. |
build_resamples_state | Build a ResamplesState from resampling results. |
extract_mee_names | Extract human-readable names from a MeeState. |
Attributes¶
Classes¶
Functions¶
build_mee_resamples¶
build_mee_resamples(mee: MeeState | None, samples: np.ndarray | None, how: str) -> ResamplesState | NoneBuild ResamplesState from MEE inference if samples are available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mee | MeeState | None | The MeeState from explore, or None. | required |
samples | ndarray | None | Raw resample array from dispatch_mee_inference, or None. | required |
how | str | Inference method used ("boot", "perm", etc.). | required |
Returns:
| Type | Description |
|---|---|
ResamplesState | None | ResamplesState if boot/perm samples were saved, else None. |
build_params_resamples¶
build_params_resamples(inference: InferenceState | None, fit_coef: np.ndarray, x_names: tuple[str, ...], how: str) -> ResamplesState | NoneBuild ResamplesState from params inference if samples are available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inference | InferenceState | None | The InferenceState from params inference, or None. | required |
fit_coef | ndarray | Coefficient estimates from the FitState. | required |
x_names | tuple[str, ...] | Design matrix column names from the DataBundle. | required |
how | str | Inference method used ("boot", "perm", etc.). | required |
Returns:
| Type | Description |
|---|---|
ResamplesState | None | ResamplesState if boot/perm samples were saved, else None. |
build_resamples_dataframe¶
build_resamples_dataframe(rs: ResamplesState) -> pl.DataFrameBuild a long-format DataFrame of raw resampled values.
Returns one row per (resample, term) combination with the raw resampled value — coefficient estimates for bootstrap, null t-statistics for permutation.
Columns: resample (int), term (str), value (float).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rs | ResamplesState | Frozen ResamplesState from bootstrap or permutation inference. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | Polars DataFrame with n_resamples × k rows, where k is the |
DataFrame | number of terms/effects. |
Examples:
>>> df = build_resamples_dataframe(rs)
>>> df.columns
['resample', 'term', 'value']
>>> df.shape
(1000, 3) # 100 resamples × 10 termsbuild_resamples_state¶
build_resamples_state(*, samples: NDArray[np.floating], observed: NDArray[np.floating], names: tuple[str, ...] | list[str], method: str, n_resamples: int, context: str) -> ResamplesStateBuild a ResamplesState from resampling results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples | NDArray[floating] | Resampled statistics array, shape (n_resamples, k). | required |
observed | NDArray[floating] | Observed statistics, shape (k,). | required |
names | tuple[str, ...] | list[str] | Term/effect names corresponding to columns of samples. | required |
method | str | Resampling method ("boot" or "perm"). | required |
n_resamples | int | Number of resamples. | required |
context | str | What was resampled ("params" or "effects"). | required |
Returns:
| Type | Description |
|---|---|
ResamplesState | Frozen ResamplesState instance. |
Examples:
>>> state = build_resamples_state(
... samples=np.random.randn(100, 2),
... observed=np.array([1.0, 2.0]),
... names=("Intercept", "x"),
... method="boot",
... n_resamples=100,
... context="params",
... )
>>> state.method
'boot'extract_mee_names¶
extract_mee_names(mee: MeeState) -> tuple[str, ...]Extract human-readable names from a MeeState.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mee | MeeState | The MeeState to extract names from. | required |
Returns:
| Type | Description |
|---|---|
tuple[str, ...] | Tuple of effect names for each estimate. |
results¶
Result DataFrame assembly utilities.
Shared helpers for building user-facing DataFrames from internal state
containers. These are used by the model property accessors (effects,
predictions, etc.) to assemble Polars DataFrames with optional
inference columns.
Functions:
| Name | Description |
|---|---|
append_inference_columns | Append standard inference columns to a DataFrame if available. |
Classes¶
Functions¶
append_inference_columns¶
append_inference_columns(df: pl.DataFrame, state: object, method: str | None = None) -> pl.DataFrameAppend standard inference columns to a DataFrame if available.
Checks state.has_inference and, when True, adds each inference
column that is not None on state. Columns are added in canonical
order: se, ci_lower, ci_upper, statistic, df, p_value.
When method is "perm", the ci_lower and ci_upper
columns are excluded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | DataFrame | Base DataFrame to augment (not mutated). | required |
state | object | Object with a has_inference bool and optional se, statistic, df, p_value, ci_lower, ci_upper array attributes. | required |
method | str | None | Inference method ("asymp", "boot", "perm", or None). Controls which columns are included. | None |
Returns:
| Type | Description |
|---|---|
DataFrame | A new DataFrame with inference columns appended (or the |
DataFrame | original DataFrame unchanged if inference is not available). |
specs¶
Builder functions for specification containers.
Functions:
| Name | Description |
|---|---|
build_model_spec | Build a ModelSpec from raw inputs. |
build_model_spec_from_formula | Build ModelSpec from a pre-parsed formula structure and resolve defaults. |
build_simulation_spec | Build a SimulationSpec for data generation. |
build_simulation_spec_from_formula | Build SimulationSpec from formula with defaults for unspecified variables. |
build_varying_spec | Build a VaryingSpec for random effect structure. |
get_varying_random_terms | Get all random terms (Intercept + slope terms) for a VaryingSpec. |
strip_backticks | Remove surrounding backtick quotes from a name. |
Classes¶
Functions¶
build_model_spec¶
build_model_spec(formula: str, *, family: str = 'gaussian', link: str | None = None, method: str | None = None, response_var: str | None = None, fixed_terms: tuple[str, ...] | list[str] | None = None, random_terms: tuple[str, ...] | list[str] | None = None, has_random_effects: bool | None = None) -> ModelSpecBuild a ModelSpec from raw inputs.
This factory function handles defaults, validation, and inference of missing fields. In a full implementation, formula parsing would extract response_var, fixed_terms, and random_terms automatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | The model formula string. | required |
family | str | Distribution family (default: “gaussian”). | ‘gaussian’ |
link | str | None | Link function. If None, uses canonical link for family. | None |
method | str | None | Estimation method. If None, inferred from family and RE. | None |
response_var | str | None | Response variable name. Required if not parsed. | None |
fixed_terms | tuple[str, ...] | list[str] | None | Fixed effect terms. Required if not parsed. | None |
random_terms | tuple[str, ...] | list[str] | None | Random effect terms (default: empty tuple). | None |
has_random_effects | bool | None | Whether model has RE. Inferred from random_terms. | None |
Returns:
| Type | Description |
|---|---|
ModelSpec | A validated ModelSpec instance. |
Examples:
>>> spec = build_model_spec(
... formula="y ~ x + treatment",
... response_var="y",
... fixed_terms=["Intercept", "x", "treatment"],
... )
>>> spec.method
'ols'build_model_spec_from_formula¶
build_model_spec_from_formula(formula: str, *, family: str = 'gaussian', link: str | None = None, method: str | None = None, structure: FormulaStructure) -> ModelSpecBuild ModelSpec from a pre-parsed formula structure and resolve defaults.
The caller must parse the formula into a FormulaStructure first
(via extract_formula_structure). This keeps containers/ free
of imports from formula/.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | R-style model formula (e.g., ``"y ~ x + (1 | group)"``). |
family | str | Distribution family (default: "gaussian"). | ‘gaussian’ |
link | str | None | Link function. If None, uses canonical link for family. | None |
method | str | None | Estimation method. If None, inferred from family and RE presence. Validated against family/RE constraints if specified. | None |
structure | FormulaStructure | Pre-parsed formula structure from extract_formula_structure(formula). | required |
Returns:
| Type | Description |
|---|---|
ModelSpec | A validated ModelSpec instance. |
Examples:
>>> from parse import extract_formula_structure
>>> s = extract_formula_structure("y ~ x + treatment")
>>> spec = build_model_spec_from_formula("y ~ x + treatment", structure=s)
>>> spec.method
'ols'build_simulation_spec¶
build_simulation_spec(n: int, *, distributions: dict[str, Distribution] | None = None, coef: dict[str, float] | None = None, sigma: float = 1.0, re_spec: dict[str, VaryingSpec] | None = None, seed: int | None = None) -> SimulationSpecBuild a SimulationSpec for data generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n | int | Total number of observations. | required |
distributions | dict[str, Distribution] | None | Variable name -> Distribution mappings. | None |
coef | dict[str, float] | None | Coefficient name -> value mappings. | None |
sigma | float | Residual standard deviation. | 1.0 |
re_spec | dict[str, VaryingSpec] | None | Grouping variable -> VaryingSpec mappings. | None |
seed | int | None | Random seed for reproducibility. | None |
Returns:
| Type | Description |
|---|---|
SimulationSpec | SimulationSpec instance. |
Examples:
>>> spec = build_simulation_spec(n=100)build_simulation_spec_from_formula¶
build_simulation_spec_from_formula(formula: str, n: int, *, distributions: dict[str, Distribution] | None = None, coef: dict[str, float] | None = None, sigma: float = 1.0, seed: int | None = None) -> SimulationSpecBuild SimulationSpec from formula with defaults for unspecified variables.
Parses the formula to identify variable types and creates appropriate default distributions for variables not explicitly specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | Model formula (e.g., "y ~ x + factor(group) + (1 | subject)"). |
n | int | Number of observations to generate. | required |
distributions | dict[str, Distribution] | None | User-provided distributions for specific variables. | None |
coef | dict[str, float] | None | True coefficient values (defaults to all zeros). | None |
sigma | float | Residual standard deviation. | 1.0 |
seed | int | None | Random seed. | None |
Returns:
| Type | Description |
|---|---|
SimulationSpec | SimulationSpec ready for data generation. |
Examples:
>>> spec = build_simulation_spec_from_formula("y ~ x + factor(group)", n=100)
>>> spec.n
100build_varying_spec¶
build_varying_spec(n: int, sd: float = 1.0, *, slope_sds: dict[str, float] | None = None, correlations: dict[tuple[str, str], float] | None = None, n_per: int | None = None) -> VaryingSpecBuild a VaryingSpec for random effect structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n | int | Number of groups. | required |
sd | float | Standard deviation for random intercept. | 1.0 |
slope_sds | dict[str, float] | None | Dictionary of term -> slope SD mappings. | None |
correlations | dict[tuple[str, str], float] | None | Dictionary of (term1, term2) -> correlation mappings. | None |
n_per | int | None | Number of units per group for nested effects. | None |
Returns:
| Type | Description |
|---|---|
VaryingSpec | VaryingSpec instance. |
Examples:
>>> spec = build_varying_spec(n=50, sd=0.3)get_varying_random_terms¶
get_varying_random_terms(spec: VaryingSpec) -> tuple[str, ...]Get all random terms (Intercept + slope terms) for a VaryingSpec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | VaryingSpec | A VaryingSpec instance. | required |
Returns:
| Type | Description |
|---|---|
tuple[str, ...] | Tuple of random term names, starting with “Intercept”. |
strip_backticks¶
strip_backticks(name: str) -> strRemove surrounding backtick quotes from a name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name | str | A column name, possibly surrounded by backticks. | required |
Returns:
| Type | Description |
|---|---|
str | The name with backticks stripped if present. |
state¶
Builder functions for computation state containers.
Functions:
| Name | Description |
|---|---|
build_cv_state | Build a CVState from cross-validation computation. |
build_fit_state | Build a FitState instance with validation. |
build_inference_state | Build an InferenceState from computed inference values. |
build_joint_test_state | Build a JointTestState from computed joint test values. |
build_mee_state | Build a MeeState from marginal effects computation. |
build_prediction_state | Build a PredictionState from prediction computation. |
build_simulation_inference_state | Build a SimulationInferenceState from computed values. |
build_varying_spread_state | Build a VaryingSpreadState from variance component estimates. |
build_varying_state | Build a VaryingState from computed BLUPs. |
Classes¶
Functions¶
build_cv_state¶
build_cv_state(k: int, rmse: float, mae: float, r_squared: float, *, deviance: float | None = None, accuracy: float | None = None, sensitivity: float | None = None, specificity: float | None = None, f1: float | None = None, auc: float | None = None, fold_metrics: dict[str, np.ndarray] | None = None, oos_predictions: np.ndarray | None = None, oos_residuals: np.ndarray | None = None, fold_assignments: np.ndarray | None = None) -> CVStateBuild a CVState from cross-validation computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k | int | Number of folds used. | required |
rmse | float | Root mean squared error. | required |
mae | float | Mean absolute error. | required |
r_squared | float | Coefficient of determination. | required |
deviance | float | None | Mean deviance (GLM only). | None |
accuracy | float | None | Classification accuracy (binomial only). | None |
sensitivity | float | None | Sensitivity / true positive rate (binomial only). | None |
specificity | float | None | Specificity / true negative rate (binomial only). | None |
f1 | float | None | F1 score (binomial only). | None |
auc | float | None | Area under ROC curve (binomial only). | None |
fold_metrics | dict[str, ndarray] | None | Per-fold metrics dictionary. | None |
oos_predictions | ndarray | None | Out-of-sample predictions. | None |
oos_residuals | ndarray | None | Out-of-sample residuals. | None |
fold_assignments | ndarray | None | Array indicating which fold each observation belongs to. | None |
Returns:
| Type | Description |
|---|---|
CVState | Frozen CVState instance. |
Examples:
>>> state = build_cv_state(
... k=10,
... rmse=0.523,
... mae=0.412,
... r_squared=0.891,
... )build_fit_state¶
build_fit_state(*, coef: NDArray[np.floating], vcov: NDArray[np.floating], fitted: NDArray[np.floating], residuals: NDArray[np.floating], leverage: NDArray[np.floating], df_resid: float, loglik: float, converged: bool = True, n_iter: int = 1, sigma: float | None = None, dispersion: float | None = None, null_deviance: float | None = None, deviance: float | None = None, theta: NDArray[np.floating] | None = None, u: NDArray[np.floating] | None = None, irls_weights: NDArray[np.floating] | None = None, XtWX_inv: NDArray[np.floating] | None = None) -> FitStateBuild a FitState instance with validation.
This builder function provides a keyword-only interface for constructing FitState instances, ensuring all required fields are explicitly provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coef | NDArray[floating] | Coefficient estimates (1D array of length p). | required |
vcov | NDArray[floating] | Variance-covariance matrix (p x p array). | required |
fitted | NDArray[floating] | Fitted values (1D array of length n). | required |
residuals | NDArray[floating] | Residuals (1D array of length n). | required |
leverage | NDArray[floating] | Hat matrix diagonal / leverage values (1D array of length n). | required |
df_resid | float | Residual degrees of freedom. | required |
loglik | float | Log-likelihood at convergence. | required |
converged | bool | Whether the optimization converged. | True |
n_iter | int | Number of iterations (1 for closed-form solutions). | 1 |
sigma | float | None | Residual standard deviation (OLS models only). | None |
dispersion | float | None | Dispersion parameter (GLM models only). | None |
null_deviance | float | None | Null model deviance (GLM models only). | None |
deviance | float | None | Residual deviance, sum of unit deviances (GLM models only). | None |
theta | NDArray[floating] | None | Random effect variance parameters (mixed models only). | None |
u | NDArray[floating] | None | Spherical random effects (mixed models only). | None |
irls_weights | NDArray[floating] | None | IRLS weights from GLM fit (GLM sandwich estimator). | None |
XtWX_inv | NDArray[floating] | None | Inverse of X’WX from GLM fit (GLM sandwich estimator). | None |
Returns:
| Type | Description |
|---|---|
FitState | A new FitState instance. |
Examples:
>>> import numpy as np
>>> from state import build_fit_state
>>> state = build_fit_state(
... coef=np.array([1.0, 2.0]),
... vcov=np.eye(2),
... fitted=np.array([1.0, 2.0, 3.0]),
... residuals=np.array([0.1, -0.1, 0.0]),
... leverage=np.array([0.3, 0.3, 0.4]),
... df_resid=1.0,
... loglik=-10.0,
... sigma=0.5,
... )
>>> state.sigma
0.5build_inference_state¶
build_inference_state(se: np.ndarray, statistic: np.ndarray, df: np.ndarray, p_value: np.ndarray, ci_lower: np.ndarray, ci_upper: np.ndarray, *, conf_level: float = 0.95, method: str = 'asymp', null: float = 0.0, alternative: str = 'two-sided', n_resamples: int | None = None, boot_samples: np.ndarray | None = None, perm_samples: np.ndarray | None = None, pre: np.ndarray | None = None, pre_sd: np.ndarray | None = None) -> InferenceStateBuild an InferenceState from computed inference values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
se | ndarray | Standard errors for each coefficient. | required |
statistic | ndarray | Test statistics (t or z). | required |
df | ndarray | Degrees of freedom. | required |
p_value | ndarray | P-values. | required |
ci_lower | ndarray | Lower confidence interval bounds. | required |
ci_upper | ndarray | Upper confidence interval bounds. | required |
conf_level | float | Confidence level (default 0.95). | 0.95 |
method | str | Inference method (“asymp”, “boot”, “perm”, “cv”). | ‘asymp’ |
null | float | Null hypothesis value (default 0.0). | 0.0 |
alternative | str | Alternative hypothesis direction (default “two-sided”). | ‘two-sided’ |
n_resamples | int | None | Number of bootstrap/permutation resamples. | None |
boot_samples | ndarray | None | Raw bootstrap samples. | None |
perm_samples | ndarray | None | Null distribution of test statistics from permutation tests. | None |
pre | ndarray | None | PRE (Proportion Reduction in Error) per coefficient (CV ablation). | None |
pre_sd | ndarray | None | Standard deviation of PRE across CV folds (CV ablation). | None |
Returns:
| Type | Description |
|---|---|
InferenceState | Frozen InferenceState instance. |
Examples:
>>> state = build_inference_state(
... se=np.array([0.1, 0.2]),
... statistic=np.array([5.0, 2.5]),
... df=np.array([98.0, 98.0]),
... p_value=np.array([0.001, 0.014]),
... ci_lower=np.array([0.3, 0.1]),
... ci_upper=np.array([0.7, 0.9]),
... )build_joint_test_state¶
build_joint_test_state(terms: tuple[str, ...] | list[str], df1: np.ndarray, statistic: np.ndarray, p_value: np.ndarray, *, test_type: str = 'F', ss_type: str = 'III', df2: np.ndarray | None = None) -> JointTestStateBuild a JointTestState from computed joint test values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
terms | tuple[str, ...] | list[str] | Names of terms being tested. | required |
df1 | ndarray | Numerator degrees of freedom per term. | required |
statistic | ndarray | Test statistic values (F or chi2). | required |
p_value | ndarray | P-values for each term. | required |
test_type | str | Type of test (“F” for linear models, “chi2” for GLMs). | ‘F’ |
ss_type | str | Sum of squares type (“II” or “III”). | ‘III’ |
df2 | ndarray | None | Denominator degrees of freedom (required for F-tests). | None |
Returns:
| Type | Description |
|---|---|
JointTestState | Frozen JointTestState instance. |
Examples:
F-test results (linear model)::
>>> state = build_joint_test_state(
... terms=("a", "b", "a:b"),
... df1=np.array([2, 1, 2]),
... df2=np.array([94, 94, 94]),
... statistic=np.array([5.2, 12.1, 0.8]),
... p_value=np.array([0.007, 0.001, 0.45]),
... test_type="F",
... )Chi-square results (GLM)::
>>> state = build_joint_test_state(
... terms=("a", "b"),
... df1=np.array([2, 1]),
... statistic=np.array([8.5, 15.2]),
... p_value=np.array([0.014, 0.0001]),
... test_type="chi2",
... )build_mee_state¶
build_mee_state(grid: 'pl.DataFrame', estimate: np.ndarray, explore_formula: str, focal_var: str, mee_type: str, *, how: str = 'mem', effect_scale: str = 'link', L_matrix: np.ndarray | None = None, contrast_method: str | None = None, n_contrast_levels: int | None = None, link: str | None = None, L_matrix_link: np.ndarray | None = None, boot_X_plus: np.ndarray | None = None, boot_X_minus: np.ndarray | None = None, boot_delta: float | None = None, se: np.ndarray | None = None, df: np.ndarray | None = None, statistic: np.ndarray | None = None, p_value: np.ndarray | None = None, ci_lower: np.ndarray | None = None, ci_upper: np.ndarray | None = None, conf_level: float | None = None) -> MeeStateBuild a MeeState from marginal effects computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid | ‘pl.DataFrame’ | Polars DataFrame with the evaluation grid. | required |
estimate | ndarray | Point estimates for each grid row. | required |
explore_formula | str | The explore formula string. | required |
focal_var | str | The primary variable being explored. | required |
mee_type | str | Type of effect (“means”, “slopes”, “contrasts”). | required |
how | str | Averaging method: "mem" (Marginal Estimated Mean, balanced reference grid) or "ame" (Average Marginal Effect, g-computation over observed data). | ‘mem’ |
effect_scale | str | Scale of estimates: "link" (linear predictor) or "response" (inverse-link / data scale). | ‘link’ |
L_matrix | ndarray | None | Design matrix for delta method inference (optional). Shape (n_estimates, n_coef). For EMMs this is X_ref. | None |
contrast_method | str | None | Original contrast type for multiplicity adjustment (“pairwise”, “sequential”, “poly”, “treatment”, “sum”, “helmert”, or None). | None |
n_contrast_levels | int | None | Number of EMM levels before contrasting (family size). | None |
link | str | None | Link function name for response-scale CI back-transformation. | None |
L_matrix_link | ndarray | None | Link-scale L_matrix for CI back-transformation. | None |
boot_X_plus | ndarray | None | Per-combo average design matrix at focal_var + delta/2. For exact response-scale bootstrap AME recomputation. | None |
boot_X_minus | ndarray | None | Per-combo average design matrix at focal_var - delta/2. | None |
boot_delta | float | None | Finite-difference step size for bootstrap slope recomputation. | None |
se | ndarray | None | Standard errors (optional, from .infer()). | None |
df | ndarray | None | Degrees of freedom (optional). | None |
statistic | ndarray | None | Test statistics (optional). | None |
p_value | ndarray | None | P-values (optional). | None |
ci_lower | ndarray | None | Lower CI bounds (optional). | None |
ci_upper | ndarray | None | Upper CI bounds (optional). | None |
conf_level | float | None | Confidence level (optional). | None |
Returns:
| Type | Description |
|---|---|
MeeState | Frozen MeeState instance. |
Examples:
>>> import polars as pl
>>> grid = pl.DataFrame({"treatment": ["A", "B", "C"]})
>>> state = build_mee_state(
... grid=grid,
... estimate=np.array([1.0, 2.0, 3.0]),
... explore_formula="treatment",
... focal_var="treatment",
... mee_type="means",
... )
>>> state.has_inference
Falsebuild_prediction_state¶
build_prediction_state(fitted: np.ndarray, *, link: np.ndarray | None = None, X_pred: np.ndarray | None = None, config: PredictionConfig | None = None, se: np.ndarray | None = None, ci_lower: np.ndarray | None = None, ci_upper: np.ndarray | None = None, interval_type: str | None = None, conf_level: float | None = None, grid: 'pl.DataFrame | None' = None) -> PredictionStateBuild a PredictionState from prediction computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fitted | ndarray | Predicted values on response scale. | required |
link | ndarray | None | Predicted values on link scale (for GLM/GLMM). | None |
X_pred | ndarray | None | Design matrix used for predictions. Stored so that .infer() can compute delta-method SEs on the correct X. | None |
config | PredictionConfig | None | Prediction configuration for bootstrap replay. | None |
se | ndarray | None | Standard errors of predictions. | None |
ci_lower | ndarray | None | Lower interval bounds. | None |
ci_upper | ndarray | None | Upper interval bounds. | None |
interval_type | str | None | Type of interval (“confidence” or “prediction”). | None |
conf_level | float | None | Confidence level for intervals. | None |
grid | ‘pl.DataFrame | None’ | Grid DataFrame for formula-mode predictions. When present, build_predictions_dataframe() prepends these columns. | None |
Returns:
| Type | Description |
|---|---|
PredictionState | Frozen PredictionState instance. |
Examples:
>>> state = build_prediction_state(
... fitted=np.array([1.0, 2.0, 3.0]),
... )
>>> state.has_inference
False
>>> # With inference
>>> state = build_prediction_state(
... fitted=np.array([1.0, 2.0, 3.0]),
... se=np.array([0.1, 0.1, 0.1]),
... ci_lower=np.array([0.8, 1.8, 2.8]),
... ci_upper=np.array([1.2, 2.2, 3.2]),
... interval_type="confidence",
... conf_level=0.95,
... )
>>> state.has_inference
Truebuild_simulation_inference_state¶
build_simulation_inference_state(sim_type: str, n_sims: int, *, sim_mean: np.ndarray | None = None, sim_sd: np.ndarray | None = None, sim_quantiles: dict[str, np.ndarray] | None = None, power: dict[str, float] | None = None, coverage: dict[str, float] | None = None, bias: dict[str, float] | None = None, rmse: dict[str, float] | None = None, alpha: float = 0.05, true_coef: dict[str, float] | None = None) -> SimulationInferenceStateBuild a SimulationInferenceState from computed values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sim_type | str | Type of simulation (“post_fit” or “power_analysis”). | required |
n_sims | int | Number of simulations. | required |
sim_mean | ndarray | None | Mean of simulated values per observation. | None |
sim_sd | ndarray | None | SD of simulated values per observation. | None |
sim_quantiles | dict[str, ndarray] | None | Dict of quantile name -> array mappings. | None |
power | dict[str, float] | None | Dict of term name -> power mappings. | None |
coverage | dict[str, float] | None | Dict of term name -> coverage mappings. | None |
bias | dict[str, float] | None | Dict of term name -> bias mappings. | None |
rmse | dict[str, float] | None | Dict of term name -> RMSE mappings. | None |
alpha | float | Significance level for power calculation. | 0.05 |
true_coef | dict[str, float] | None | True coefficient values for coverage/bias. | None |
Returns:
| Type | Description |
|---|---|
SimulationInferenceState | Frozen SimulationInferenceState instance. |
Examples:
>>> state = build_simulation_inference_state(
... sim_type="post_fit",
... n_sims=100,
... sim_mean=np.array([1.0, 2.0, 3.0]),
... sim_sd=np.array([0.1, 0.2, 0.3]),
... )build_varying_spread_state¶
build_varying_spread_state(components: 'pl.DataFrame', sigma2: float, tau2: dict[str, float], *, rho: dict[str, float] | None = None, icc: float | None = None, ci_lower: dict[str, float] | None = None, ci_upper: dict[str, float] | None = None, conf_level: float | None = None, ci_method: str | None = None) -> VaryingSpreadStateBuild a VaryingSpreadState from variance component estimates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
components | ‘pl.DataFrame’ | Polars DataFrame with component estimates. | required |
sigma2 | float | Residual variance. | required |
tau2 | dict[str, float] | Dict mapping effect names to variance estimates. | required |
rho | dict[str, float] | None | Dict mapping effect pairs to correlations (optional). | None |
icc | float | None | Intraclass correlation coefficient (optional). | None |
ci_lower | dict[str, float] | None | Lower CI bounds (optional, from .infer()). | None |
ci_upper | dict[str, float] | None | Upper CI bounds (optional, from .infer()). | None |
conf_level | float | None | Confidence level (optional). | None |
ci_method | str | None | CI method used (optional). | None |
Returns:
| Type | Description |
|---|---|
VaryingSpreadState | Frozen VaryingSpreadState instance. |
Examples:
>>> import polars as pl
>>> components = pl.DataFrame({
... "component": ["sigma2", "tau2_Intercept", "icc"],
... "estimate": [1.0, 0.5, 0.33],
... })
>>> state = build_varying_spread_state(
... components=components,
... sigma2=1.0,
... tau2={"Intercept": 0.5},
... icc=0.33,
... )build_varying_state¶
build_varying_state(grid: 'pl.DataFrame', effects: dict[str, np.ndarray], grouping_var: str, n_groups: int, *, pi_lower: dict[str, np.ndarray] | None = None, pi_upper: dict[str, np.ndarray] | None = None, conf_level: float | None = None) -> VaryingStateBuild a VaryingState from computed BLUPs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid | ‘pl.DataFrame’ | Polars DataFrame with group identifiers. | required |
effects | dict[str, ndarray] | Dict mapping effect names to BLUP arrays. | required |
grouping_var | str | Name of the grouping variable. | required |
n_groups | int | Number of groups. | required |
pi_lower | dict[str, ndarray] | None | Lower prediction interval bounds (optional). | None |
pi_upper | dict[str, ndarray] | None | Upper prediction interval bounds (optional). | None |
conf_level | float | None | Confidence level for intervals (optional). | None |
Returns:
| Type | Description |
|---|---|
VaryingState | Frozen VaryingState instance. |
Examples:
>>> state = build_varying_state(
... grid=pl.DataFrame({"subject": ["S1", "S2", "S3"]}),
... effects={"Intercept": np.array([0.5, -0.3, 0.1])},
... grouping_var="subject",
... n_groups=3,
... )