Model fitting, diagnostics, convergence, varying parameters, and prediction.
Call chain:
model.fit() -> fit_model() -> resolve_solver() -> fit_ols_qr / fit_glm_irls / fit_lmer_pls / fit_glmer_pirlsAttributes:
| Name | Type | Description |
|---|---|---|
VALID_SOLVERS | frozenset[str] |
Classes:
| Name | Description |
|---|---|
FitResult | Immutable result of the fit lifecycle. |
Functions:
| Name | Description |
|---|---|
augment_data_with_diagnostics | Augment raw data with diagnostic columns after fit. |
build_mixed_post_fit_state | Compute BLUPs, variance components, and emit convergence warnings. |
build_predict_grid | Build a Cartesian-product prediction grid. |
check_convergence | Run convergence diagnostics on a fitted mixed model. |
compute_diagnostics | Compute model-level diagnostics as a single-row DataFrame. |
compute_metadata | Compute model metadata as a single-row DataFrame. |
compute_optimizer_diagnostics | Compute optimizer convergence diagnostics as a single-row DataFrame. |
compute_predictions_from_formula | Parse a predict formula, build the grid, compute predictions, and attach grid columns. |
compute_r_squared | Compute R-squared and adjusted R-squared from raw arrays. |
compute_varying_spread_state | Compute VaryingSpreadState (variance components) from theta parameters. |
compute_varying_state | Compute VaryingState (BLUPs) from fitted random effects parameters. |
execute_fit | Execute the full fit lifecycle: bundle rebuild → fit → post-fit state → diagnostics. |
fit_glm_irls | Fit generalized linear model using Iteratively Reweighted Least Squares. |
fit_glmer_pirls | Fit generalized linear mixed model using Penalized IRLS. |
fit_lmer_pls | Fit linear mixed-effects model using Penalized Least Squares. |
fit_model | Dispatch to appropriate fitter based on model specification. |
fit_ols_qr | Fit ordinary or weighted least squares using QR decomposition. |
get_theta_lower_bounds | Get lower bounds for theta parameters. |
parse_fit_kwargs | Validate and extract fitting parameters from **kwargs. |
parse_predict_formula | Parse an explore-style formula and build a prediction grid. |
per_factor_re_info | Split global RE metadata into per-factor structures and names. |
resolve_condition_values | Resolve a :class:Condition to concrete values or None. |
resolve_solver | Select the appropriate solver for a model configuration. |
validate_fit_method | Validate and apply a user-specified fitting method to a ModelSpec. |
Modules:
| Name | Description |
|---|---|
convergence | Convergence diagnostics for fitted mixed-effects models. |
diagnostics | Model-level diagnostics computation. |
dispatch | Solver dispatch for model fitting. |
glm | GLM fitting via Iteratively Reweighted Least Squares (IRLS). |
glmer | GLMM fitting via Penalized IRLS (PIRLS). |
grid | Prediction grid construction for formula-mode predictions. |
lifecycle | Fit lifecycle orchestration. |
lmer | LMM fitting via Penalized Least Squares (PLS). |
ols | OLS fitting via QR decomposition. |
predict | Prediction operations on containers. |
varying | Varying parameter extraction for mixed-effects models. |
Attributes¶
VALID_SOLVERS¶
VALID_SOLVERS: frozenset[str] = frozenset({'qr', 'irls', 'pls', 'pirls'})Classes¶
FitResult¶
Immutable result of the fit lifecycle.
Attributes:
| Name | Type | Description |
|---|---|---|
fit | FitState | Fitted model state (coefficients, residuals, etc.). |
bundle | DataBundle | Data bundle used for fitting (may be rebuilt). |
formula_spec | object | Learned formula spec for newdata evaluation. |
raw_data | DataFrame | None | Original data snapshot (pre-augmentation). |
augmented_data | DataFrame | None | Data with diagnostic columns, or None. |
varying_offsets | VaryingState | None | BLUPs for mixed models, or None. |
varying_spread | VaryingSpreadState | None | Variance components for mixed models, or None. |
Attributes¶
augmented_data¶
augmented_data: pl.DataFrame | Nonebundle¶
bundle: DataBundlefit¶
fit: FitStateformula_spec¶
formula_spec: objectraw_data¶
raw_data: pl.DataFrame | Nonevarying_offsets¶
varying_offsets: VaryingState | None = Nonevarying_spread¶
varying_spread: VaryingSpreadState | None = NoneFunctions¶
augment_data_with_diagnostics¶
augment_data_with_diagnostics(*, raw_data: pl.DataFrame, fit: FitState, bundle: DataBundle) -> pl.DataFrameAugment raw data with diagnostic columns after fit.
Adds fitted, resid, hat, std_resid, cooksd columns (names from
AugmentedDataCols schema). Values are NaN for rows dropped
due to missing data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_data | DataFrame | Original data DataFrame (pre-NA-drop). | required |
fit | FitState | Fitted state with residuals, fitted values, leverage. | required |
bundle | DataBundle | Data bundle with valid_mask, n_total, p. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with diagnostic columns appended. |
build_mixed_post_fit_state¶
build_mixed_post_fit_state(fit: FitState, bundle: DataBundle, data: pl.DataFrame, *, stacklevel: int = 3) -> tuple[VaryingState | None, VaryingSpreadState | None]Compute BLUPs, variance components, and emit convergence warnings.
Orchestrates the post-fit assembly for mixed-effects models: computes VaryingState (BLUPs) and VaryingSpreadState (variance components) from the fitted parameters, then checks for convergence issues.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fit | FitState | Fitted model state containing theta, u, sigma. | required |
bundle | DataBundle | Data bundle with RE metadata and valid mask. | required |
data | DataFrame | Original training data (used for group level labels). | required |
stacklevel | int | Warning stacklevel for convergence warnings. Default 3 accounts for: user → model.fit() → build_mixed_post_fit_state(). | 3 |
Returns:
| Type | Description |
|---|---|
VaryingState | None | A tuple (varying_offsets, varying_spread) where either may be |
VaryingSpreadState | None | None if the required fitted parameters are missing. |
build_predict_grid¶
build_predict_grid(data: pl.DataFrame, focal_var: str, response_col: str, grouping_factors: tuple[str, ...], *, focal_values: list[float | str] | None = None, n_points: int | Literal['data'] = 50, varying_vars: list[str] | None = None, at: dict[str, Any] | None = None) -> pl.DataFrameBuild a Cartesian-product prediction grid.
Creates a grid where the focal variable is varied, condition variables are expanded, and all other predictors are held at reference values (mean for continuous, first sorted level for categorical). Grouping factors and the response column are excluded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | DataFrame | Training data (Polars DataFrame). | required |
focal_var | str | The predictor to vary across the grid. | required |
response_col | str | Response column name (excluded from grid). | required |
grouping_factors | tuple[str, ...] | Random-effect grouping variables (excluded). | required |
focal_values | list[float | str] | None | Explicit values for the focal variable. Overrides default linspace/unique-levels logic. | None |
n_points | int | Literal[‘data’] | Number of grid points for continuous focal variables. Use "data" to use actual observed unique values. | 50 |
varying_vars | list[str] | None | Condition variables to expand (all unique levels). | None |
at | dict[str, Any] | None | Dict of pinned values. Scalar = single constant, list = expand. | None |
Returns:
| Type | Description |
|---|---|
DataFrame | Polars DataFrame with the Cartesian-product prediction grid. |
check_convergence¶
check_convergence(fit: FitState, re_meta: REInfo) -> list[ConvergenceMessage]Run convergence diagnostics on a fitted mixed model.
Extracts theta and sigma from FitState, computes theta lower bounds
and per-factor RE info from REInfo, and delegates to
diagnose_convergence() for the actual diagnostic checks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fit | FitState | Fitted model state (must have theta, sigma, converged). | required |
re_meta | REInfo | Random effects metadata from the DataBundle. | required |
Returns:
| Type | Description |
|---|---|
list[ConvergenceMessage] | List of ConvergenceMessage objects. Empty if theta is None. |
compute_diagnostics¶
compute_diagnostics(*, model_type: str, spec: ModelSpec, bundle: DataBundle, fit: FitState, coef_for_predict: np.ndarray, varying_spread: VaryingSpreadState | None, cv: CVState | None, has_intercept: bool = True) -> pl.DataFrameCompute model-level diagnostics as a single-row DataFrame.
Builds goodness-of-fit diagnostics from fitted model state, with columns varying by model type (lm, glm, lmer, glmer).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_type | str | One of “lm”, “glm”, “lmer”, “glmer”. | required |
spec | ModelSpec | Model specification (for family). | required |
bundle | DataBundle | Data bundle (for n, rank, X, y, re_metadata). | required |
fit | FitState | Fitted state (for coefficients, residuals, loglik, etc.). | required |
coef_for_predict | ndarray | Coefficients safe for matrix multiplication (NaN replaced by 0 for rank-deficient models). | required |
varying_spread | VaryingSpreadState | None | Random effects variance components (mixed models). | required |
cv | CVState | None | Cross-validation state, or None. | required |
has_intercept | bool | Whether the model includes an intercept. Affects R² computation (centered vs uncentered SS_tot). | True |
Returns:
| Type | Description |
|---|---|
DataFrame | Single-row Polars DataFrame with model diagnostics. See |
DataFrame | model.diagnostics for full column documentation. |
compute_metadata¶
compute_metadata(*, bundle: DataBundle) -> pl.DataFrameCompute model metadata as a single-row DataFrame.
Returns sample/structural info about the model: observation counts, parameter count, and group counts (for mixed models).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle | DataBundle | Data bundle (for n, n_total, p, re_metadata). | required |
Returns:
| Type | Description |
|---|---|
DataFrame | Single-row Polars DataFrame with model metadata. |
compute_optimizer_diagnostics¶
compute_optimizer_diagnostics(*, model_type: str, fit: FitState) -> pl.DataFrameCompute optimizer convergence diagnostics as a single-row DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_type | str | One of “lm”, “glm”, “lmer”, “glmer”. | required |
fit | FitState | Fitted state with convergence info, theta, dispersion. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | Single-row Polars DataFrame with optimizer diagnostics. |
compute_predictions_from_formula¶
compute_predictions_from_formula(formula: str, data: pl.DataFrame, spec: object, bundle: object, fit: object, formula_spec: object, pred_type: str, varying: str, allow_new_levels: bool, n_points: int | Literal['data']) -> 'PredictionState'Parse a predict formula, build the grid, compute predictions, and attach grid columns.
Combines parse_predict_formula, compute_predictions, and
grid-column attachment into a single call for model.predict()
formula mode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | Explore-style formula (e.g. "wt ~ cyl"). | required |
data | DataFrame | Training data. | required |
spec | object | Model specification. | required |
bundle | object | Data bundle. | required |
fit | object | Fitted model state. | required |
formula_spec | object | Learned formula spec for newdata evaluation. | required |
pred_type | str | Prediction scale ("response" or "link"). | required |
varying | str | RE handling ("exclude" or "include"). | required |
allow_new_levels | bool | If True, new groups predict at population level. | required |
n_points | int | Literal[‘data’] | Number of grid points for continuous focal variables. | required |
Returns:
| Type | Description |
|---|---|
‘PredictionState’ | PredictionState with grid columns attached. |
compute_r_squared¶
compute_r_squared(y: np.ndarray, residuals: np.ndarray, n: int, p: int, has_intercept: bool = True) -> tuple[float, float]Compute R-squared and adjusted R-squared from raw arrays.
For models with an intercept, uses centered SS_tot = sum((y - mean(y))^2).
For no-intercept models, uses uncentered SS_tot = sum(y^2), matching R’s
summary.lm() behavior.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response vector of shape (n,). | required |
residuals | ndarray | Residual vector of shape (n,). | required |
n | int | Number of observations. | required |
p | int | Number of parameters (including intercept if present). | required |
has_intercept | bool | Whether the model includes an intercept. | True |
Returns:
| Type | Description |
|---|---|
tuple[float, float] | Tuple of (R-squared, adjusted R-squared). |
compute_varying_spread_state¶
compute_varying_spread_state(theta: NDArray[np.floating], sigma: float, re_meta: REInfo) -> VaryingSpreadStateCompute VaryingSpreadState (variance components) from theta parameters.
Extracts residual variance (sigma²), random effect variances (tau²), correlations (rho), and intraclass correlation (ICC) from the fitted theta vector using the random effects structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | NDArray[floating] | Variance component parameters from the fitted model. | required |
sigma | float | Residual standard deviation from the fitted model. | required |
re_meta | REInfo | Random effects metadata (grouping vars, structure, etc.). | required |
Returns:
| Type | Description |
|---|---|
VaryingSpreadState | VaryingSpreadState container with components DataFrame and |
VaryingSpreadState | decomposed variance quantities. |
compute_varying_state¶
compute_varying_state(theta: NDArray[np.floating], u: NDArray[np.floating], re_meta: REInfo, data: pl.DataFrame | None = None) -> VaryingStateCompute VaryingState (BLUPs) from fitted random effects parameters.
Converts spherical random effects u to BLUPs b = Lambda @ u
using the relative covariance factor Lambda built from theta.
Constructs a grid of group/level combinations and maps BLUP values
to named effects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | NDArray[floating] | Variance component parameters from the fitted model. | required |
u | NDArray[floating] | Spherical random effects vector from the fitted model. | required |
re_meta | REInfo | Random effects metadata (grouping vars, structure, etc.). | required |
data | DataFrame | None | Original training data, used to extract unique group levels. If None, levels are labeled "0", "1", etc. | None |
Returns:
| Type | Description |
|---|---|
VaryingState | VaryingState container with grid, effects dict, and group info. |
execute_fit¶
execute_fit(spec: ModelSpec, bundle: DataBundle | None, data: pl.DataFrame, raw_data: pl.DataFrame | None, formula: str, custom_contrasts: dict | None, weights_col: str | None, offset_col: str | None, missing: str, is_mixed: bool, solver_override: str | None, fit_kwargs: dict) -> FitResultExecute the full fit lifecycle: bundle rebuild → fit → post-fit state → diagnostics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification. | required |
bundle | DataBundle | None | Existing data bundle, or None to force rebuild. | required |
data | DataFrame | Current data (raw_data-restored by caller). | required |
raw_data | DataFrame | None | Original pre-augmentation snapshot, or None. | required |
formula | str | Formula string for bundle building. | required |
custom_contrasts | dict | None | User contrast matrices, or None. | required |
weights_col | str | None | Weights column name, or None. | required |
offset_col | str | None | Offset column name, or None. | required |
missing | str | Missing value handling ("drop" or "fail"). | required |
is_mixed | bool | Whether this is a mixed-effects model. | required |
solver_override | str | None | Explicit solver, or None for auto. | required |
fit_kwargs | dict | Additional kwargs for fit_model(). | required |
Returns:
| Type | Description |
|---|---|
FitResult | FitResult with all state the model needs to assign. |
fit_glm_irls¶
fit_glm_irls(spec: ModelSpec, bundle: DataBundle, *, max_iter: int = 25, tol: float = 1e-08) -> FitStateFit generalized linear model using Iteratively Reweighted Least Squares.
This adapter wraps the IRLS implementation in IRLS solves GLMs by iterating between computing working weights and solving a weighted least squares problem.
Initialize mu from y (or link function default)
Initialize mu from y (or link function default)
For each iteration: a. Compute working weights: W = 1 / (V(mu) * g’(mu)^2) b. Compute working response: z = eta + (y - mu) * g’(mu) c. Solve weighted least squares: beta = (X’WX)^{-1} X’Wz d. Update eta = X @ beta, mu = g^{-1}(eta)
Continue until convergence (change in deviance < tol)
gaussian: Identity variance, identity link
gaussian: Identity variance, identity link
binomial: mu(1-mu) variance, logit/probit/cloglog link
poisson: mu variance, log link
gamma: mu^2 variance, inverse/log link
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification containing: - family: Distribution family (determines variance function) - link: Link function (determines g and g’) | required |
bundle | DataBundle | Data bundle containing X, y, and optional weights. | required |
max_iter | int | Maximum IRLS iterations (default: 25). | 25 |
tol | float | Convergence tolerance on deviance (default: 1e-8). | 1e-08 |
Returns:
| Type | Description |
|---|---|
FitState | FitState containing: - coef: Coefficient estimates - vcov: Variance-covariance (observed Fisher information) - fitted: Predicted values on response scale - residuals: Response residuals (y - mu) - leverage: Hat matrix diagonal - df_resid: Residual degrees of freedom - loglik: Log-likelihood - dispersion: Estimated dispersion parameter - converged: Whether IRLS converged - n_iter: Number of IRLS iterations |
See Also:
glm: Underlying IRLS implementation
fit_glmer_pirls¶
fit_glmer_pirls(spec: ModelSpec, bundle: DataBundle, *, max_iter: int = 25, max_outer_iter: int = 10000, tol: float = 1e-07, verbose: bool = False, nAGQ: int = 1, use_hessian: bool = False) -> FitStateFit generalized linear mixed model using Penalized IRLS.
This adapter wraps the PIRLS implementation from PIRLS combines IRLS (for the GLM part) with PLS (for random effects), using Laplace approximation to integrate out the random effects.
Outer loop (BOBYQA optimization over theta): Outer loop (BOBYQA optimization over theta): For each theta: 1. Build Lambda from theta
Inner loop (PIRLS iterations):
a. Compute working weights from current eta/mu
b. Compute working response
c. Solve weighted PLS for beta and u
d. Update eta = X @ beta + Z @ Lambda @ u
e. Update mu = g^{-1}(eta)
f. Step-halving if deviance increased
g. Check convergence
2. Return Laplace devianceSelect theta minimizing Laplace deviance
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification containing: - family: Distribution family - link: Link function - random_terms: Parsed random effect specifications | required |
bundle | DataBundle | Data bundle containing: - X: Fixed effects design matrix - Z: Random effects design matrix (sparse) - y: Response vector - re_metadata: Grouping structure | required |
max_iter | int | Maximum PIRLS iterations per theta (default: 25). | 25 |
max_outer_iter | int | Maximum BOBYQA iterations (default: 10000). | 10000 |
tol | float | PIRLS convergence tolerance (default: 1e-7). | 1e-07 |
verbose | bool | Print optimization progress (default: False). | False |
nAGQ | int | Quadrature points (0 or 1, default: 1). | 1 |
use_hessian | bool | Use Hessian-based vcov (default: False). The default Schur complement approach matches lme4’s vcov() with use.hessian=FALSE and avoids expensive numerical differentiation. Set to True for observed-information vcov. | False |
Returns:
| Type | Description |
|---|---|
FitState | FitState containing: - coef: Fixed effect coefficient estimates - vcov: Variance-covariance (observed information or Schur complement) - fitted: Predicted values on response scale (mu) - residuals: Response residuals (y - mu) - leverage: Approximate leverage values - df_resid: Residual degrees of freedom - loglik: Laplace-approximated log-likelihood - dispersion: Dispersion (1.0 for binomial/poisson) - theta: Optimized relative covariance parameters - u: Spherical random effects - converged: Whether both PIRLS and BOBYQA converged - n_iter: Number of optimizer evaluations |
See Also:
glmer: Underlying PIRLS implementation
fit_lmer_pls¶
fit_lmer_pls(spec: ModelSpec, bundle: DataBundle, *, max_iter: int = 10000, verbose: bool = False) -> FitStateFit linear mixed-effects model using Penalized Least Squares.
This adapter wraps the PLS implementation from PLS is the algorithm from Bates et al. (2015) used in R’s lme4 package.
Outer loop (BOBYQA optimization over theta): Outer loop (BOBYQA optimization over theta): For each theta (relative covariance parameters): 1. Build Lambda (block-diagonal Cholesky factor from theta) 2. Form S_22 = Lambda’ Z’ Z Lambda + I 3. Sparse Cholesky factorization of S_22 4. Compute Schur complement for fixed effects 5. Solve for beta (fixed effects) and u (spherical RE) 6. Compute REML or ML deviance
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification containing: - method: “reml” or “ml” (determines objective function) - random_terms: Parsed random effect specifications | required |
bundle | DataBundle | Data bundle containing: - X: Fixed effects design matrix (n x p) - Z: Random effects design matrix (n x q, sparse CSC) - y: Response vector - re_metadata: Grouping structure information | required |
max_iter | int | Maximum BOBYQA iterations (default: 10000). | 10000 |
verbose | bool | Print optimization progress (default: False). | False |
Returns:
| Type | Description |
|---|---|
FitState | FitState containing: - coef: Fixed effect coefficient estimates - vcov: Variance-covariance of fixed effects - fitted: Predicted values (fixed + random) - residuals: Response residuals (y - fitted) - leverage: Approximate leverage values - df_resid: Residual degrees of freedom - loglik: REML or ML log-likelihood - sigma: Residual standard deviation - theta: Optimized relative covariance parameters - u: Spherical random effects (unit variance) - converged: Whether optimizer converged - n_iter: Number of optimizer iterations |
See Also:
lmer: Underlying PLS implementation
fit_model¶
fit_model(spec: ModelSpec, bundle: DataBundle, *, solver: str | None = None, max_iter: int | None = None, max_outer_iter: int = 10000, tol: float | None = None, verbose: bool = False, nAGQ: int = 1, use_hessian: bool = False) -> FitStateDispatch to appropriate fitter based on model specification.
This is the main entry point for fitting models. It examines the ModelSpec to determine the appropriate solver and delegates to the corresponding fitter function.
If the design matrix is rank-deficient (detected during bundle construction), the X matrix is reduced to estimable columns before fitting. After fitting, coefficients and vcov are expanded back to full size with NaN for dropped columns (matching R’s lm() behavior).
The solver selection follows the estimation method matrix:
| Family | Random Effects | Method | Solver | Description |
|---|---|---|---|---|
| gaussian | No | ols | qr | QR decomposition |
| gaussian | No | ml | irls | Maximum likelihood |
| non-gauss | No | ml | irls | GLM via IRLS |
| gaussian | Yes | reml/ml | pls | Penalized least squares |
| non-gauss | Yes | ml | pirls | Penalized IRLS |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification containing formula, family, link, method, and parsed formula components. | required |
bundle | DataBundle | Prepared data bundle containing design matrices (X, y, Z), column names, valid observation mask, and optional weights/offset. | required |
solver | str | None | Override solver selection. If None, auto-selected via resolve_solver(). Must be one of "qr", "irls", "pls", "pirls". | None |
max_iter | int | None | Maximum iterations (solver-specific defaults if None). | None |
max_outer_iter | int | Maximum outer (BOBYQA) iterations for GLMER (default: 10000). | 10000 |
tol | float | None | Convergence tolerance (solver-specific defaults if None). | None |
verbose | bool | Print optimization progress (default: False). | False |
nAGQ | int | Quadrature points for GLMER (default: 1). | 1 |
use_hessian | bool | Use Hessian-based vcov for GLMER (default: False). | False |
Returns:
| Type | Description |
|---|---|
FitState | FitState containing all fitting results. |
Examples:
>>> import numpy as np
>>> from containers import build_model_spec, DataBundle
>>> spec = build_model_spec(
... formula="y ~ x",
... response_var="y",
... fixed_terms=["Intercept", "x"],
... )
>>> bundle = DataBundle(
... X=np.array([[1.0, 1.0], [1.0, 2.0], [1.0, 3.0]]),
... y=np.array([2.0, 4.0, 6.0]),
... X_names=["Intercept", "x"],
... y_name="y",
... valid_mask=np.array([True, True, True]),
... n_total=3,
... )
>>> state = fit_model(spec, bundle)
>>> state.converged
True
>>> state.coef # [Intercept, x] = [0, 2]
array([0., 2.])fit_ols_qr¶
fit_ols_qr(spec: ModelSpec, bundle: DataBundle) -> FitStateFit ordinary or weighted least squares using QR decomposition.
Supports observation weights (WLS) and offset terms. When weights are present, solves the transformed system sqrt(W)*X, sqrt(W)*y via QR decomposition, which yields WLS coefficients and vcov directly. Offsets are subtracted from y before fitting and added back to fitted values.
Subtract offset from y (if present): y_adj = y - offset
Subtract offset from y (if present): y_adj = y - offset
Apply weights (if present): X_w = sqrt(w)*X, y_w = sqrt(w)*y_adj
QR decompose X_w with column pivoting for stability
Solve R * beta = Q.T @ y_w via back-substitution
Recompute original-scale: fitted = X @ beta + offset, resid = y - fitted
vcov = sigma_w^2 * (X’WX)^{-1}
Leverage from (possibly weighted) hat matrix
Matches R’s logLik.lm formula::
Matches R’s logLik.lm formula::
L = 0.5*sum(log(w)) - n/2 * (log(2*pi) + log(RSS_w/n) + 1)The 0.5*sum(log(w)) term is the Jacobian from the weight
transformation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification (unused for OLS, included for interface consistency with other fitters). | required |
bundle | DataBundle | Data bundle containing: - X: Design matrix (n x p) - y: Response vector (n,) - weights: Observation weights (n,) or None for OLS - offset: Offset vector (n,) or None | required |
Returns:
| Type | Description |
|---|---|
FitState | FitState containing: - coef: Coefficient estimates, shape (p,) - vcov: Variance-covariance matrix, shape (p, p) - fitted: Fitted values X @ coef + offset, shape (n,) - residuals: y - fitted, shape (n,) - leverage: Hat matrix diagonal, shape (n,) - df_resid: Residual degrees of freedom (n - rank) - loglik: Gaussian log-likelihood (weighted if applicable) - sigma: Residual standard deviation - converged: Always True (closed-form solution) - n_iter: Always 1 (single step) |
Examples:
>>> import numpy as np
>>> from containers import build_model_spec, DataBundle
>>> spec = build_model_spec(
... formula="y ~ x",
... response_var="y",
... fixed_terms=["Intercept", "x"],
... )
>>> bundle = DataBundle(
... X=np.array([[1.0, 1.0], [1.0, 2.0], [1.0, 3.0]]),
... y=np.array([2.0, 4.0, 6.0]),
... X_names=["Intercept", "x"],
... y_name="y",
... valid_mask=np.array([True, True, True]),
... n_total=3,
... )
>>> state = fit_ols_qr(spec, bundle)
>>> np.allclose(state.fitted + state.residuals, bundle.y)
True
>>> np.allclose(state.coef, [0.0, 2.0]) # Perfect fit: y = 2x
Trueget_theta_lower_bounds¶
get_theta_lower_bounds(n_theta: int, re_structure: str, metadata: dict | None = None) -> list[float]Get lower bounds for theta parameters.
Diagonal elements of Cholesky factor must be non-negative. Off-diagonal elements are unbounded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_theta | int | Number of theta parameters | required |
re_structure | str | Random effects structure type | required |
metadata | dict | None | Optional metadata dict with ‘re_structures_list’ for crossed/nested/mixed structures | None |
Returns:
| Type | Description |
|---|---|
list[float] | List of lower bounds |
parse_fit_kwargs¶
parse_fit_kwargs(spec: ModelSpec, kwargs: dict[str, object], nAGQ: int | None) -> tuple[ModelSpec, str | None, dict[str, object]]Validate and extract fitting parameters from **kwargs.
Pops solver, method, and nAGQ from kwargs, validates each,
and assembles the remaining fit-specific keyword arguments into a dict
suitable for fit_model().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Current model specification (may be evolved if method is set). | required |
kwargs | dict[str, object] | Mutable dict of user-supplied keyword arguments. Recognized keys are popped: solver, method, max_iter, max_outer_iter, tol, verbose, nAGQ, use_hessian. | required |
nAGQ | int | None | Explicit nAGQ parameter from the fit() signature (takes precedence over any value in kwargs). | required |
Returns:
| Type | Description |
|---|---|
ModelSpec | A tuple (updated_spec, solver_override, fit_kwargs) where: |
str | None | - updated_spec has the validated method applied (if method was set). |
dict[str, object] | - solver_override is the validated solver string, or None. |
tuple[ModelSpec, str | None, dict[str, object]] | - fit_kwargs is a dict ready to splat into fit_model(). |
parse_predict_formula¶
parse_predict_formula(formula: str, data: pl.DataFrame, response_col: str, grouping_factors: tuple[str, ...], *, n_points: int | Literal['data'] = 50) -> tuple[pl.DataFrame, list[str]]Parse an explore-style formula and build a prediction grid.
Translates the formula via :func:parse_explore_formula, rejects
contrast formulas, and delegates to :func:build_predict_grid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | Explore-style formula (e.g. "wt ~ cyl"). | required |
data | DataFrame | Training data. | required |
response_col | str | Response column name. | required |
grouping_factors | tuple[str, ...] | Random-effect grouping variables. | required |
n_points | int | Literal[‘data’] | Number of grid points for continuous focal variables. | 50 |
Returns:
| Type | Description |
|---|---|
DataFrame | Tuple of (grid DataFrame, list of grid column names for output). |
list[str] | The grid column names are the focal var plus any condition vars |
tuple[DataFrame, list[str]] | (the columns that vary across the grid, excluding reference-value |
tuple[DataFrame, list[str]] | columns). |
per_factor_re_info¶
per_factor_re_info(re_meta: REInfo, group_names: list[str]) -> tuple[str | list[str], list[str] | dict[str, list[str]]]Split global RE metadata into per-factor structures and names.
For crossed/nested/mixed models, the global re_structure is a single
string (e.g. “crossed”) and random_names is a concatenated list across
all factors. This function splits them into per-factor structures and
per-factor name dicts suitable for BLUP decomposition and convergence
diagnostics.
For single-factor models, returns the originals unchanged.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
re_meta | REInfo | Random effects metadata from the fitted model’s DataBundle. | required |
group_names | list[str] | Ordered list of grouping variable names (e.g. ["subject"] or ["subject", "item"]). | required |
Returns:
| Type | Description |
|---|---|
str | list[str] | A tuple (re_structure, random_names) where: |
list[str] | dict[str, list[str]] | - For single-factor models: (str, list[str]) — the originals. |
tuple[str | list[str], list[str] | dict[str, list[str]]] | - For multi-factor models: (list[str], dict[str, list[str]]) — per-factor structure list and a dict mapping group name to its random effect names. |
resolve_condition_values¶
resolve_condition_values(cond: Condition, data: pl.DataFrame) -> list | NoneResolve a :class:Condition to concrete values or None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cond | Condition | A Condition from :func:parse_explore_formula. | required |
data | DataFrame | The model’s training data. | required |
Returns:
| Type | Description |
|---|---|
list | None | A list of concrete values if the condition specifies explicit |
list | None | values (at_values, at_range, at_quantile), or |
list | None | None for bare conditions (use all unique levels). |
resolve_solver¶
resolve_solver(spec: ModelSpec) -> strSelect the appropriate solver for a model configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification. | required |
Returns:
| Type | Description |
|---|---|
str | Solver name: “qr”, “irls”, “pls”, or “pirls”. |
validate_fit_method¶
validate_fit_method(spec: ModelSpec, method_str: str) -> ModelSpecValidate and apply a user-specified fitting method to a ModelSpec.
Checks that the method is compatible with the model’s family and random-effects structure, then returns an evolved spec with the new method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Current model specification. | required |
method_str | str | User-supplied method string (e.g. "ols", "ml", "reml"). Will be lowercased. | required |
Returns:
| Type | Description |
|---|---|
ModelSpec | Evolved ModelSpec with the validated method applied. |
Modules¶
convergence¶
Convergence diagnostics for fitted mixed-effects models.
Encapsulates the repeated pattern of extracting theta, computing bounds, assembling per-factor RE info, and running diagnose_convergence().
Functions:
| Name | Description |
|---|---|
check_convergence | Run convergence diagnostics on a fitted mixed model. |
Classes¶
Functions¶
check_convergence¶
check_convergence(fit: FitState, re_meta: REInfo) -> list[ConvergenceMessage]Run convergence diagnostics on a fitted mixed model.
Extracts theta and sigma from FitState, computes theta lower bounds
and per-factor RE info from REInfo, and delegates to
diagnose_convergence() for the actual diagnostic checks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fit | FitState | Fitted model state (must have theta, sigma, converged). | required |
re_meta | REInfo | Random effects metadata from the DataBundle. | required |
Returns:
| Type | Description |
|---|---|
list[ConvergenceMessage] | List of ConvergenceMessage objects. Empty if theta is None. |
diagnostics¶
Model-level diagnostics computation.
Pure functions that compute model diagnostics from containers. These were
extracted from model/core.py to keep the model class as thin glue.
Attributes¶
Classes¶
Functions¶
augment_data_with_diagnostics¶
augment_data_with_diagnostics(*, raw_data: pl.DataFrame, fit: FitState, bundle: DataBundle) -> pl.DataFrameAugment raw data with diagnostic columns after fit.
Adds fitted, resid, hat, std_resid, cooksd columns (names from
AugmentedDataCols schema). Values are NaN for rows dropped
due to missing data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_data | DataFrame | Original data DataFrame (pre-NA-drop). | required |
fit | FitState | Fitted state with residuals, fitted values, leverage. | required |
bundle | DataBundle | Data bundle with valid_mask, n_total, p. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with diagnostic columns appended. |
compute_diagnostics¶
compute_diagnostics(*, model_type: str, spec: ModelSpec, bundle: DataBundle, fit: FitState, coef_for_predict: np.ndarray, varying_spread: VaryingSpreadState | None, cv: CVState | None, has_intercept: bool = True) -> pl.DataFrameCompute model-level diagnostics as a single-row DataFrame.
Builds goodness-of-fit diagnostics from fitted model state, with columns varying by model type (lm, glm, lmer, glmer).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_type | str | One of “lm”, “glm”, “lmer”, “glmer”. | required |
spec | ModelSpec | Model specification (for family). | required |
bundle | DataBundle | Data bundle (for n, rank, X, y, re_metadata). | required |
fit | FitState | Fitted state (for coefficients, residuals, loglik, etc.). | required |
coef_for_predict | ndarray | Coefficients safe for matrix multiplication (NaN replaced by 0 for rank-deficient models). | required |
varying_spread | VaryingSpreadState | None | Random effects variance components (mixed models). | required |
cv | CVState | None | Cross-validation state, or None. | required |
has_intercept | bool | Whether the model includes an intercept. Affects R² computation (centered vs uncentered SS_tot). | True |
Returns:
| Type | Description |
|---|---|
DataFrame | Single-row Polars DataFrame with model diagnostics. See |
DataFrame | model.diagnostics for full column documentation. |
compute_metadata¶
compute_metadata(*, bundle: DataBundle) -> pl.DataFrameCompute model metadata as a single-row DataFrame.
Returns sample/structural info about the model: observation counts, parameter count, and group counts (for mixed models).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle | DataBundle | Data bundle (for n, n_total, p, re_metadata). | required |
Returns:
| Type | Description |
|---|---|
DataFrame | Single-row Polars DataFrame with model metadata. |
compute_optimizer_diagnostics¶
compute_optimizer_diagnostics(*, model_type: str, fit: FitState) -> pl.DataFrameCompute optimizer convergence diagnostics as a single-row DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_type | str | One of “lm”, “glm”, “lmer”, “glmer”. | required |
fit | FitState | Fitted state with convergence info, theta, dispersion. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | Single-row Polars DataFrame with optimizer diagnostics. |
compute_r_squared¶
compute_r_squared(y: np.ndarray, residuals: np.ndarray, n: int, p: int, has_intercept: bool = True) -> tuple[float, float]Compute R-squared and adjusted R-squared from raw arrays.
For models with an intercept, uses centered SS_tot = sum((y - mean(y))^2).
For no-intercept models, uses uncentered SS_tot = sum(y^2), matching R’s
summary.lm() behavior.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response vector of shape (n,). | required |
residuals | ndarray | Residual vector of shape (n,). | required |
n | int | Number of observations. | required |
p | int | Number of parameters (including intercept if present). | required |
has_intercept | bool | Whether the model includes an intercept. | True |
Returns:
| Type | Description |
|---|---|
tuple[float, float] | Tuple of (R-squared, adjusted R-squared). |
dispatch¶
Solver dispatch for model fitting.
Provides fit_model() which dispatches to the appropriate fitter based on model specification, and resolve_solver() which determines the solver type.
Handles rank-deficient design matrices by reducing X before fitting and expanding coefficients/vcov after, inserting NaN for dropped columns.
Functions:
| Name | Description |
|---|---|
fit_model | Dispatch to appropriate fitter based on model specification. |
parse_fit_kwargs | Validate and extract fitting parameters from **kwargs. |
resolve_solver | Select the appropriate solver for a model configuration. |
validate_fit_method | Validate and apply a user-specified fitting method to a ModelSpec. |
Attributes:
| Name | Type | Description |
|---|---|---|
VALID_SOLVERS | frozenset[str] |
Attributes¶
VALID_SOLVERS¶
VALID_SOLVERS: frozenset[str] = frozenset({'qr', 'irls', 'pls', 'pirls'})Classes¶
Functions¶
fit_model¶
fit_model(spec: ModelSpec, bundle: DataBundle, *, solver: str | None = None, max_iter: int | None = None, max_outer_iter: int = 10000, tol: float | None = None, verbose: bool = False, nAGQ: int = 1, use_hessian: bool = False) -> FitStateDispatch to appropriate fitter based on model specification.
This is the main entry point for fitting models. It examines the ModelSpec to determine the appropriate solver and delegates to the corresponding fitter function.
If the design matrix is rank-deficient (detected during bundle construction), the X matrix is reduced to estimable columns before fitting. After fitting, coefficients and vcov are expanded back to full size with NaN for dropped columns (matching R’s lm() behavior).
The solver selection follows the estimation method matrix:
| Family | Random Effects | Method | Solver | Description |
|---|---|---|---|---|
| gaussian | No | ols | qr | QR decomposition |
| gaussian | No | ml | irls | Maximum likelihood |
| non-gauss | No | ml | irls | GLM via IRLS |
| gaussian | Yes | reml/ml | pls | Penalized least squares |
| non-gauss | Yes | ml | pirls | Penalized IRLS |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification containing formula, family, link, method, and parsed formula components. | required |
bundle | DataBundle | Prepared data bundle containing design matrices (X, y, Z), column names, valid observation mask, and optional weights/offset. | required |
solver | str | None | Override solver selection. If None, auto-selected via resolve_solver(). Must be one of "qr", "irls", "pls", "pirls". | None |
max_iter | int | None | Maximum iterations (solver-specific defaults if None). | None |
max_outer_iter | int | Maximum outer (BOBYQA) iterations for GLMER (default: 10000). | 10000 |
tol | float | None | Convergence tolerance (solver-specific defaults if None). | None |
verbose | bool | Print optimization progress (default: False). | False |
nAGQ | int | Quadrature points for GLMER (default: 1). | 1 |
use_hessian | bool | Use Hessian-based vcov for GLMER (default: False). | False |
Returns:
| Type | Description |
|---|---|
FitState | FitState containing all fitting results. |
Examples:
>>> import numpy as np
>>> from containers import build_model_spec, DataBundle
>>> spec = build_model_spec(
... formula="y ~ x",
... response_var="y",
... fixed_terms=["Intercept", "x"],
... )
>>> bundle = DataBundle(
... X=np.array([[1.0, 1.0], [1.0, 2.0], [1.0, 3.0]]),
... y=np.array([2.0, 4.0, 6.0]),
... X_names=["Intercept", "x"],
... y_name="y",
... valid_mask=np.array([True, True, True]),
... n_total=3,
... )
>>> state = fit_model(spec, bundle)
>>> state.converged
True
>>> state.coef # [Intercept, x] = [0, 2]
array([0., 2.])parse_fit_kwargs¶
parse_fit_kwargs(spec: ModelSpec, kwargs: dict[str, object], nAGQ: int | None) -> tuple[ModelSpec, str | None, dict[str, object]]Validate and extract fitting parameters from **kwargs.
Pops solver, method, and nAGQ from kwargs, validates each,
and assembles the remaining fit-specific keyword arguments into a dict
suitable for fit_model().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Current model specification (may be evolved if method is set). | required |
kwargs | dict[str, object] | Mutable dict of user-supplied keyword arguments. Recognized keys are popped: solver, method, max_iter, max_outer_iter, tol, verbose, nAGQ, use_hessian. | required |
nAGQ | int | None | Explicit nAGQ parameter from the fit() signature (takes precedence over any value in kwargs). | required |
Returns:
| Type | Description |
|---|---|
ModelSpec | A tuple (updated_spec, solver_override, fit_kwargs) where: |
str | None | - updated_spec has the validated method applied (if method was set). |
dict[str, object] | - solver_override is the validated solver string, or None. |
tuple[ModelSpec, str | None, dict[str, object]] | - fit_kwargs is a dict ready to splat into fit_model(). |
resolve_solver¶
resolve_solver(spec: ModelSpec) -> strSelect the appropriate solver for a model configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification. | required |
Returns:
| Type | Description |
|---|---|
str | Solver name: “qr”, “irls”, “pls”, or “pirls”. |
validate_fit_method¶
validate_fit_method(spec: ModelSpec, method_str: str) -> ModelSpecValidate and apply a user-specified fitting method to a ModelSpec.
Checks that the method is compatible with the model’s family and random-effects structure, then returns an evolved spec with the new method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Current model specification. | required |
method_str | str | User-supplied method string (e.g. "ols", "ml", "reml"). Will be lowercased. | required |
Returns:
| Type | Description |
|---|---|
ModelSpec | Evolved ModelSpec with the validated method applied. |
glm¶
GLM fitting via Iteratively Reweighted Least Squares (IRLS).
Functions:
| Name | Description |
|---|---|
fit_glm_irls | Fit generalized linear model using Iteratively Reweighted Least Squares. |
Classes¶
Functions¶
fit_glm_irls¶
fit_glm_irls(spec: ModelSpec, bundle: DataBundle, *, max_iter: int = 25, tol: float = 1e-08) -> FitStateFit generalized linear model using Iteratively Reweighted Least Squares.
This adapter wraps the IRLS implementation in IRLS solves GLMs by iterating between computing working weights and solving a weighted least squares problem.
Initialize mu from y (or link function default)
Initialize mu from y (or link function default)
For each iteration: a. Compute working weights: W = 1 / (V(mu) * g’(mu)^2) b. Compute working response: z = eta + (y - mu) * g’(mu) c. Solve weighted least squares: beta = (X’WX)^{-1} X’Wz d. Update eta = X @ beta, mu = g^{-1}(eta)
Continue until convergence (change in deviance < tol)
gaussian: Identity variance, identity link
gaussian: Identity variance, identity link
binomial: mu(1-mu) variance, logit/probit/cloglog link
poisson: mu variance, log link
gamma: mu^2 variance, inverse/log link
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification containing: - family: Distribution family (determines variance function) - link: Link function (determines g and g’) | required |
bundle | DataBundle | Data bundle containing X, y, and optional weights. | required |
max_iter | int | Maximum IRLS iterations (default: 25). | 25 |
tol | float | Convergence tolerance on deviance (default: 1e-8). | 1e-08 |
Returns:
| Type | Description |
|---|---|
FitState | FitState containing: - coef: Coefficient estimates - vcov: Variance-covariance (observed Fisher information) - fitted: Predicted values on response scale - residuals: Response residuals (y - mu) - leverage: Hat matrix diagonal - df_resid: Residual degrees of freedom - loglik: Log-likelihood - dispersion: Estimated dispersion parameter - converged: Whether IRLS converged - n_iter: Number of IRLS iterations |
See Also:
glm: Underlying IRLS implementation
Modules¶
glmer¶
GLMM fitting via Penalized IRLS (PIRLS).
Functions:
| Name | Description |
|---|---|
fit_glmer_pirls | Fit generalized linear mixed model using Penalized IRLS. |
Classes¶
Functions¶
fit_glmer_pirls¶
fit_glmer_pirls(spec: ModelSpec, bundle: DataBundle, *, max_iter: int = 25, max_outer_iter: int = 10000, tol: float = 1e-07, verbose: bool = False, nAGQ: int = 1, use_hessian: bool = False) -> FitStateFit generalized linear mixed model using Penalized IRLS.
This adapter wraps the PIRLS implementation from PIRLS combines IRLS (for the GLM part) with PLS (for random effects), using Laplace approximation to integrate out the random effects.
Outer loop (BOBYQA optimization over theta): Outer loop (BOBYQA optimization over theta): For each theta: 1. Build Lambda from theta
Inner loop (PIRLS iterations):
a. Compute working weights from current eta/mu
b. Compute working response
c. Solve weighted PLS for beta and u
d. Update eta = X @ beta + Z @ Lambda @ u
e. Update mu = g^{-1}(eta)
f. Step-halving if deviance increased
g. Check convergence
2. Return Laplace devianceSelect theta minimizing Laplace deviance
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification containing: - family: Distribution family - link: Link function - random_terms: Parsed random effect specifications | required |
bundle | DataBundle | Data bundle containing: - X: Fixed effects design matrix - Z: Random effects design matrix (sparse) - y: Response vector - re_metadata: Grouping structure | required |
max_iter | int | Maximum PIRLS iterations per theta (default: 25). | 25 |
max_outer_iter | int | Maximum BOBYQA iterations (default: 10000). | 10000 |
tol | float | PIRLS convergence tolerance (default: 1e-7). | 1e-07 |
verbose | bool | Print optimization progress (default: False). | False |
nAGQ | int | Quadrature points (0 or 1, default: 1). | 1 |
use_hessian | bool | Use Hessian-based vcov (default: False). The default Schur complement approach matches lme4’s vcov() with use.hessian=FALSE and avoids expensive numerical differentiation. Set to True for observed-information vcov. | False |
Returns:
| Type | Description |
|---|---|
FitState | FitState containing: - coef: Fixed effect coefficient estimates - vcov: Variance-covariance (observed information or Schur complement) - fitted: Predicted values on response scale (mu) - residuals: Response residuals (y - mu) - leverage: Approximate leverage values - df_resid: Residual degrees of freedom - loglik: Laplace-approximated log-likelihood - dispersion: Dispersion (1.0 for binomial/poisson) - theta: Optimized relative covariance parameters - u: Spherical random effects - converged: Whether both PIRLS and BOBYQA converged - n_iter: Number of optimizer evaluations |
See Also:
glmer: Underlying PIRLS implementation
grid¶
Prediction grid construction for formula-mode predictions.
Provides parse_predict_formula() which translates an explore-style
formula into a prediction grid (Polars DataFrame), and
build_predict_grid() which assembles the Cartesian-product grid from
column specifications.
Shared by model.predict() (formula mode) and viz/predict.py
(plot_predict).
Functions:
| Name | Description |
|---|---|
build_predict_grid | Build a Cartesian-product prediction grid. |
compute_predictions_from_formula | Parse a predict formula, build the grid, compute predictions, and attach grid columns. |
parse_predict_formula | Parse an explore-style formula and build a prediction grid. |
resolve_condition_values | Resolve a :class:Condition to concrete values or None. |
Classes¶
Functions¶
build_predict_grid¶
build_predict_grid(data: pl.DataFrame, focal_var: str, response_col: str, grouping_factors: tuple[str, ...], *, focal_values: list[float | str] | None = None, n_points: int | Literal['data'] = 50, varying_vars: list[str] | None = None, at: dict[str, Any] | None = None) -> pl.DataFrameBuild a Cartesian-product prediction grid.
Creates a grid where the focal variable is varied, condition variables are expanded, and all other predictors are held at reference values (mean for continuous, first sorted level for categorical). Grouping factors and the response column are excluded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | DataFrame | Training data (Polars DataFrame). | required |
focal_var | str | The predictor to vary across the grid. | required |
response_col | str | Response column name (excluded from grid). | required |
grouping_factors | tuple[str, ...] | Random-effect grouping variables (excluded). | required |
focal_values | list[float | str] | None | Explicit values for the focal variable. Overrides default linspace/unique-levels logic. | None |
n_points | int | Literal[‘data’] | Number of grid points for continuous focal variables. Use "data" to use actual observed unique values. | 50 |
varying_vars | list[str] | None | Condition variables to expand (all unique levels). | None |
at | dict[str, Any] | None | Dict of pinned values. Scalar = single constant, list = expand. | None |
Returns:
| Type | Description |
|---|---|
DataFrame | Polars DataFrame with the Cartesian-product prediction grid. |
compute_predictions_from_formula¶
compute_predictions_from_formula(formula: str, data: pl.DataFrame, spec: object, bundle: object, fit: object, formula_spec: object, pred_type: str, varying: str, allow_new_levels: bool, n_points: int | Literal['data']) -> 'PredictionState'Parse a predict formula, build the grid, compute predictions, and attach grid columns.
Combines parse_predict_formula, compute_predictions, and
grid-column attachment into a single call for model.predict()
formula mode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | Explore-style formula (e.g. "wt ~ cyl"). | required |
data | DataFrame | Training data. | required |
spec | object | Model specification. | required |
bundle | object | Data bundle. | required |
fit | object | Fitted model state. | required |
formula_spec | object | Learned formula spec for newdata evaluation. | required |
pred_type | str | Prediction scale ("response" or "link"). | required |
varying | str | RE handling ("exclude" or "include"). | required |
allow_new_levels | bool | If True, new groups predict at population level. | required |
n_points | int | Literal[‘data’] | Number of grid points for continuous focal variables. | required |
Returns:
| Type | Description |
|---|---|
‘PredictionState’ | PredictionState with grid columns attached. |
parse_predict_formula¶
parse_predict_formula(formula: str, data: pl.DataFrame, response_col: str, grouping_factors: tuple[str, ...], *, n_points: int | Literal['data'] = 50) -> tuple[pl.DataFrame, list[str]]Parse an explore-style formula and build a prediction grid.
Translates the formula via :func:parse_explore_formula, rejects
contrast formulas, and delegates to :func:build_predict_grid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | Explore-style formula (e.g. "wt ~ cyl"). | required |
data | DataFrame | Training data. | required |
response_col | str | Response column name. | required |
grouping_factors | tuple[str, ...] | Random-effect grouping variables. | required |
n_points | int | Literal[‘data’] | Number of grid points for continuous focal variables. | 50 |
Returns:
| Type | Description |
|---|---|
DataFrame | Tuple of (grid DataFrame, list of grid column names for output). |
list[str] | The grid column names are the focal var plus any condition vars |
tuple[DataFrame, list[str]] | (the columns that vary across the grid, excluding reference-value |
tuple[DataFrame, list[str]] | columns). |
resolve_condition_values¶
resolve_condition_values(cond: Condition, data: pl.DataFrame) -> list | NoneResolve a :class:Condition to concrete values or None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cond | Condition | A Condition from :func:parse_explore_formula. | required |
data | DataFrame | The model’s training data. | required |
Returns:
| Type | Description |
|---|---|
list | None | A list of concrete values if the condition specifies explicit |
list | None | values (at_values, at_range, at_quantile), or |
list | None | None for bare conditions (use all unique levels). |
lifecycle¶
Fit lifecycle orchestration.
Owns the multi-step fit sequence: bundle rebuild → fit → post-fit state →
diagnostics augmentation. Called by model.fit() so the model class stays
a thin facade.
Classes:
| Name | Description |
|---|---|
FitResult | Immutable result of the fit lifecycle. |
Functions:
| Name | Description |
|---|---|
execute_fit | Execute the full fit lifecycle: bundle rebuild → fit → post-fit state → diagnostics. |
Classes¶
FitResult¶
Immutable result of the fit lifecycle.
Attributes:
| Name | Type | Description |
|---|---|---|
fit | FitState | Fitted model state (coefficients, residuals, etc.). |
bundle | DataBundle | Data bundle used for fitting (may be rebuilt). |
formula_spec | object | Learned formula spec for newdata evaluation. |
raw_data | DataFrame | None | Original data snapshot (pre-augmentation). |
augmented_data | DataFrame | None | Data with diagnostic columns, or None. |
varying_offsets | VaryingState | None | BLUPs for mixed models, or None. |
varying_spread | VaryingSpreadState | None | Variance components for mixed models, or None. |
Attributes¶
augmented_data¶
augmented_data: pl.DataFrame | Nonebundle¶
bundle: DataBundlefit¶
fit: FitStateformula_spec¶
formula_spec: objectraw_data¶
raw_data: pl.DataFrame | Nonevarying_offsets¶
varying_offsets: VaryingState | None = Nonevarying_spread¶
varying_spread: VaryingSpreadState | None = NoneFunctions¶
execute_fit¶
execute_fit(spec: ModelSpec, bundle: DataBundle | None, data: pl.DataFrame, raw_data: pl.DataFrame | None, formula: str, custom_contrasts: dict | None, weights_col: str | None, offset_col: str | None, missing: str, is_mixed: bool, solver_override: str | None, fit_kwargs: dict) -> FitResultExecute the full fit lifecycle: bundle rebuild → fit → post-fit state → diagnostics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification. | required |
bundle | DataBundle | None | Existing data bundle, or None to force rebuild. | required |
data | DataFrame | Current data (raw_data-restored by caller). | required |
raw_data | DataFrame | None | Original pre-augmentation snapshot, or None. | required |
formula | str | Formula string for bundle building. | required |
custom_contrasts | dict | None | User contrast matrices, or None. | required |
weights_col | str | None | Weights column name, or None. | required |
offset_col | str | None | Offset column name, or None. | required |
missing | str | Missing value handling ("drop" or "fail"). | required |
is_mixed | bool | Whether this is a mixed-effects model. | required |
solver_override | str | None | Explicit solver, or None for auto. | required |
fit_kwargs | dict | Additional kwargs for fit_model(). | required |
Returns:
| Type | Description |
|---|---|
FitResult | FitResult with all state the model needs to assign. |
lmer¶
LMM fitting via Penalized Least Squares (PLS).
Functions:
| Name | Description |
|---|---|
fit_lmer_pls | Fit linear mixed-effects model using Penalized Least Squares. |
Classes¶
Functions¶
fit_lmer_pls¶
fit_lmer_pls(spec: ModelSpec, bundle: DataBundle, *, max_iter: int = 10000, verbose: bool = False) -> FitStateFit linear mixed-effects model using Penalized Least Squares.
This adapter wraps the PLS implementation from PLS is the algorithm from Bates et al. (2015) used in R’s lme4 package.
Outer loop (BOBYQA optimization over theta): Outer loop (BOBYQA optimization over theta): For each theta (relative covariance parameters): 1. Build Lambda (block-diagonal Cholesky factor from theta) 2. Form S_22 = Lambda’ Z’ Z Lambda + I 3. Sparse Cholesky factorization of S_22 4. Compute Schur complement for fixed effects 5. Solve for beta (fixed effects) and u (spherical RE) 6. Compute REML or ML deviance
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification containing: - method: “reml” or “ml” (determines objective function) - random_terms: Parsed random effect specifications | required |
bundle | DataBundle | Data bundle containing: - X: Fixed effects design matrix (n x p) - Z: Random effects design matrix (n x q, sparse CSC) - y: Response vector - re_metadata: Grouping structure information | required |
max_iter | int | Maximum BOBYQA iterations (default: 10000). | 10000 |
verbose | bool | Print optimization progress (default: False). | False |
Returns:
| Type | Description |
|---|---|
FitState | FitState containing: - coef: Fixed effect coefficient estimates - vcov: Variance-covariance of fixed effects - fitted: Predicted values (fixed + random) - residuals: Response residuals (y - fitted) - leverage: Approximate leverage values - df_resid: Residual degrees of freedom - loglik: REML or ML log-likelihood - sigma: Residual standard deviation - theta: Optimized relative covariance parameters - u: Spherical random effects (unit variance) - converged: Whether optimizer converged - n_iter: Number of optimizer iterations |
See Also:
lmer: Underlying PLS implementation
ols¶
OLS fitting via QR decomposition.
Functions:
| Name | Description |
|---|---|
fit_ols_qr | Fit ordinary or weighted least squares using QR decomposition. |
Classes¶
Functions¶
fit_ols_qr¶
fit_ols_qr(spec: ModelSpec, bundle: DataBundle) -> FitStateFit ordinary or weighted least squares using QR decomposition.
Supports observation weights (WLS) and offset terms. When weights are present, solves the transformed system sqrt(W)*X, sqrt(W)*y via QR decomposition, which yields WLS coefficients and vcov directly. Offsets are subtracted from y before fitting and added back to fitted values.
Subtract offset from y (if present): y_adj = y - offset
Subtract offset from y (if present): y_adj = y - offset
Apply weights (if present): X_w = sqrt(w)*X, y_w = sqrt(w)*y_adj
QR decompose X_w with column pivoting for stability
Solve R * beta = Q.T @ y_w via back-substitution
Recompute original-scale: fitted = X @ beta + offset, resid = y - fitted
vcov = sigma_w^2 * (X’WX)^{-1}
Leverage from (possibly weighted) hat matrix
Matches R’s logLik.lm formula::
Matches R’s logLik.lm formula::
L = 0.5*sum(log(w)) - n/2 * (log(2*pi) + log(RSS_w/n) + 1)The 0.5*sum(log(w)) term is the Jacobian from the weight
transformation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification (unused for OLS, included for interface consistency with other fitters). | required |
bundle | DataBundle | Data bundle containing: - X: Design matrix (n x p) - y: Response vector (n,) - weights: Observation weights (n,) or None for OLS - offset: Offset vector (n,) or None | required |
Returns:
| Type | Description |
|---|---|
FitState | FitState containing: - coef: Coefficient estimates, shape (p,) - vcov: Variance-covariance matrix, shape (p, p) - fitted: Fitted values X @ coef + offset, shape (n,) - residuals: y - fitted, shape (n,) - leverage: Hat matrix diagonal, shape (n,) - df_resid: Residual degrees of freedom (n - rank) - loglik: Gaussian log-likelihood (weighted if applicable) - sigma: Residual standard deviation - converged: Always True (closed-form solution) - n_iter: Always 1 (single step) |
Examples:
>>> import numpy as np
>>> from containers import build_model_spec, DataBundle
>>> spec = build_model_spec(
... formula="y ~ x",
... response_var="y",
... fixed_terms=["Intercept", "x"],
... )
>>> bundle = DataBundle(
... X=np.array([[1.0, 1.0], [1.0, 2.0], [1.0, 3.0]]),
... y=np.array([2.0, 4.0, 6.0]),
... X_names=["Intercept", "x"],
... y_name="y",
... valid_mask=np.array([True, True, True]),
... n_total=3,
... )
>>> state = fit_ols_qr(spec, bundle)
>>> np.allclose(state.fitted + state.residuals, bundle.y)
True
>>> np.allclose(state.coef, [0.0, 2.0]) # Perfect fit: y = 2x
Truepredict¶
Prediction operations on containers.
Pure functions for computing predictions on new data, including random effects contribution for mixed models. Extracted from model/core.py.
Classes¶
Functions¶
build_X_for_newdata¶
build_X_for_newdata(formula_spec: FormulaSpec | None, X_names: tuple[str, ...], newdata: pl.DataFrame) -> NDArray[np.float64]Build design matrix X for new data.
Uses the stored FormulaSpec to properly handle factors, transformations (log, poly, center), and interactions. This ensures new data is encoded consistently with the training data.
When no FormulaSpec is available (e.g. simulation-only workflows), falls back to manual column stacking.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula_spec | FormulaSpec | None | Encoding state from build_design_matrices(), or None for the fallback path. | required |
X_names | tuple[str, ...] | Column names of the training design matrix. | required |
newdata | DataFrame | New data for prediction as a Polars DataFrame. | required |
Returns:
| Type | Description |
|---|---|
NDArray[float64] | Design matrix with same columns as training X, shape |
NDArray[float64] | (n_new, p). |
build_re_covariates¶
build_re_covariates(newdata: pl.DataFrame, factor_names: list[str], valid_indices: NDArray[np.intp], formula_spec: FormulaSpec | None, X_names: tuple[str, ...]) -> NDArray[np.float64]Build random effects covariate matrix for valid newdata rows.
For each random effect term (e.g. Intercept, slope variable), extracts the appropriate covariate values from newdata for the valid rows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
newdata | DataFrame | New data DataFrame. | required |
factor_names | list[str] | Names of random effect terms for this grouping factor (e.g. ["Intercept", "x"]). | required |
valid_indices | NDArray[intp] | Integer indices of valid (non-NA) rows in newdata. | required |
formula_spec | FormulaSpec | None | FormulaSpec for proper design matrix encoding, or None for the fallback path. | required |
X_names | tuple[str, ...] | Column names from the training design matrix. | required |
Returns:
| Type | Description |
|---|---|
NDArray[float64] | Array of shape (n_valid, n_re) with covariate values. |
compute_predictions¶
compute_predictions(spec: ModelSpec, bundle: DataBundle, fit: FitState, formula_spec: FormulaSpec | None, training_data: pl.DataFrame | None, newdata: pl.DataFrame | None, pred_type: Literal['response', 'link'], *, varying: Literal['exclude', 'include'] = 'exclude', allow_new_levels: bool = False) -> PredictionStateCompute predictions for given data.
For training data (newdata=None), returns fitted values directly.
For new data, builds the design matrix, computes the linear predictor,
optionally adds random effects, and applies the inverse link function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification (family, link, etc.). | required |
bundle | DataBundle | Training data bundle (X, X_names, rank_info, re_metadata). | required |
fit | FitState | Fitted state (coefficients, theta, u, fitted values). | required |
formula_spec | FormulaSpec | None | FormulaSpec for encoding new data, or None. | required |
training_data | DataFrame | None | Original training DataFrame for group-level mapping, or None. | required |
newdata | DataFrame | None | Data for prediction. If None, uses training data fitted values. | required |
pred_type | Literal[‘response’, ‘link’] | Prediction scale ("response" or "link"). | required |
varying | Literal[‘exclude’, ‘include’] | How to handle random effects for mixed models. "exclude" for population-level, "include" for conditional predictions with BLUPs. | ‘exclude’ |
allow_new_levels | bool | If True, new groups predict at population level. If False, raises ValueError for unseen groups. | False |
Returns:
| Type | Description |
|---|---|
PredictionState | PredictionState with fitted values and optional link-scale values. |
compute_re_contribution¶
compute_re_contribution(re_meta: REInfo, theta: NDArray[np.floating], u: NDArray[np.floating], training_data: pl.DataFrame | None, newdata: pl.DataFrame, valid_mask: NDArray[np.bool_], allow_new_levels: bool, formula_spec: FormulaSpec | None, X_names: tuple[str, ...]) -> NDArray[np.float64]Compute random effects contribution for new data predictions.
For each valid observation in newdata, maps group labels to trained group indices and computes the BLUP contribution as the dot product of the random effects design row and the group’s estimated BLUPs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
re_meta | REInfo | Random effects metadata from the DataBundle. | required |
theta | NDArray[floating] | Variance parameters (relative scale) from FitState. | required |
u | NDArray[floating] | Spherical random effects from FitState. | required |
training_data | DataFrame | None | Original training data (Polars DataFrame) for extracting known group levels, or None. | required |
newdata | DataFrame | New data with grouping columns. | required |
valid_mask | NDArray[bool_] | Boolean mask of valid (non-NA) rows in the design matrix, shape (n,). | required |
allow_new_levels | bool | If True, new groups get 0 RE contribution (population-level prediction). If False, raises ValueError. | required |
formula_spec | FormulaSpec | None | FormulaSpec for proper design matrix encoding (passed through to build_re_covariates). | required |
X_names | tuple[str, ...] | Column names of the training design matrix (passed through to build_re_covariates). | required |
Returns:
| Type | Description |
|---|---|
NDArray[float64] | Array of RE contributions for valid rows only, shape (n_valid,). |
resolve_coef_for_predict¶
resolve_coef_for_predict(coef: NDArray[np.floating], rank_info: RankInfo | None) -> NDArray[np.floating]Coefficients safe for matrix multiplication (NaN -> 0).
When rank-deficient columns produce NaN coefficients, those NaN values
must be zeroed out for X @ coef to work correctly. The dropped
columns contribute nothing to predictions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coef | NDArray[floating] | Coefficient array, shape (p,). | required |
rank_info | RankInfo | None | Rank deficiency info from the DataBundle, or None if the design is full rank. | required |
Returns:
| Type | Description |
|---|---|
NDArray[floating] | Coefficient array with NaN replaced by 0 when the design is |
NDArray[floating] | rank-deficient, or the original array unchanged. |
validate_newdata_groups¶
validate_newdata_groups(re_meta: REInfo, training_data: pl.DataFrame | None, newdata: pl.DataFrame, allow_new_levels: bool) -> NoneValidate grouping columns and levels in new data for mixed models.
Ensures that all grouping variables exist in newdata and that group
levels are a subset of those seen during training (unless
allow_new_levels is True).
Called from compute_predictions when varying="include" for
mixed models, ensuring group structure is valid before attempting
to compute BLUP contributions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
re_meta | REInfo | Random effects metadata from the DataBundle. | required |
training_data | DataFrame | None | Original training DataFrame for extracting known group levels, or None. | required |
newdata | DataFrame | New data for prediction as a Polars DataFrame. | required |
allow_new_levels | bool | If True, skip the unseen-level check. | required |
varying¶
Varying parameter extraction for mixed-effects models.
Extracts BLUP (Best Linear Unbiased Predictor) computation and variance component decomposition from the model class into pure functions on containers. These operations convert fitted spherical random effects into interpretable group-level parameters.
per_factor_re_info: Split global RE metadata into per-factor structures. per_factor_re_info: Split global RE metadata into per-factor structures. compute_varying_state: Compute BLUPs from theta and u via Lambda matrix. compute_varying_spread_state: Extract variance components (tau², rho, ICC).
Functions:
| Name | Description |
|---|---|
build_mixed_post_fit_state | Compute BLUPs, variance components, and emit convergence warnings. |
compute_varying_spread_state | Compute VaryingSpreadState (variance components) from theta parameters. |
compute_varying_state | Compute VaryingState (BLUPs) from fitted random effects parameters. |
per_factor_re_info | Split global RE metadata into per-factor structures and names. |
Attributes¶
Classes¶
Functions¶
build_mixed_post_fit_state¶
build_mixed_post_fit_state(fit: FitState, bundle: DataBundle, data: pl.DataFrame, *, stacklevel: int = 3) -> tuple[VaryingState | None, VaryingSpreadState | None]Compute BLUPs, variance components, and emit convergence warnings.
Orchestrates the post-fit assembly for mixed-effects models: computes VaryingState (BLUPs) and VaryingSpreadState (variance components) from the fitted parameters, then checks for convergence issues.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fit | FitState | Fitted model state containing theta, u, sigma. | required |
bundle | DataBundle | Data bundle with RE metadata and valid mask. | required |
data | DataFrame | Original training data (used for group level labels). | required |
stacklevel | int | Warning stacklevel for convergence warnings. Default 3 accounts for: user → model.fit() → build_mixed_post_fit_state(). | 3 |
Returns:
| Type | Description |
|---|---|
VaryingState | None | A tuple (varying_offsets, varying_spread) where either may be |
VaryingSpreadState | None | None if the required fitted parameters are missing. |
compute_varying_spread_state¶
compute_varying_spread_state(theta: NDArray[np.floating], sigma: float, re_meta: REInfo) -> VaryingSpreadStateCompute VaryingSpreadState (variance components) from theta parameters.
Extracts residual variance (sigma²), random effect variances (tau²), correlations (rho), and intraclass correlation (ICC) from the fitted theta vector using the random effects structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | NDArray[floating] | Variance component parameters from the fitted model. | required |
sigma | float | Residual standard deviation from the fitted model. | required |
re_meta | REInfo | Random effects metadata (grouping vars, structure, etc.). | required |
Returns:
| Type | Description |
|---|---|
VaryingSpreadState | VaryingSpreadState container with components DataFrame and |
VaryingSpreadState | decomposed variance quantities. |
compute_varying_state¶
compute_varying_state(theta: NDArray[np.floating], u: NDArray[np.floating], re_meta: REInfo, data: pl.DataFrame | None = None) -> VaryingStateCompute VaryingState (BLUPs) from fitted random effects parameters.
Converts spherical random effects u to BLUPs b = Lambda @ u
using the relative covariance factor Lambda built from theta.
Constructs a grid of group/level combinations and maps BLUP values
to named effects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | NDArray[floating] | Variance component parameters from the fitted model. | required |
u | NDArray[floating] | Spherical random effects vector from the fitted model. | required |
re_meta | REInfo | Random effects metadata (grouping vars, structure, etc.). | required |
data | DataFrame | None | Original training data, used to extract unique group levels. If None, levels are labeled "0", "1", etc. | None |
Returns:
| Type | Description |
|---|---|
VaryingState | VaryingState container with grid, effects dict, and group info. |
per_factor_re_info¶
per_factor_re_info(re_meta: REInfo, group_names: list[str]) -> tuple[str | list[str], list[str] | dict[str, list[str]]]Split global RE metadata into per-factor structures and names.
For crossed/nested/mixed models, the global re_structure is a single
string (e.g. “crossed”) and random_names is a concatenated list across
all factors. This function splits them into per-factor structures and
per-factor name dicts suitable for BLUP decomposition and convergence
diagnostics.
For single-factor models, returns the originals unchanged.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
re_meta | REInfo | Random effects metadata from the fitted model’s DataBundle. | required |
group_names | list[str] | Ordered list of grouping variable names (e.g. ["subject"] or ["subject", "item"]). | required |
Returns:
| Type | Description |
|---|---|
str | list[str] | A tuple (re_structure, random_names) where: |
list[str] | dict[str, list[str]] | - For single-factor models: (str, list[str]) — the originals. |
tuple[str | list[str], list[str] | dict[str, list[str]]] | - For multi-factor models: (list[str], dict[str, list[str]]) — per-factor structure list and a dict mapping group name to its random effect names. |