Model-specific fitting algorithms (IRLS, PLS, PIRLS).
Classes:
| Name | Description |
|---|---|
LambdaTemplate | Template for efficient Lambda matrix updates during optimization. |
PLSInvariants | Pre-computed quantities that don’t change during theta optimization. |
PatternTemplate | Template for preserving sparsity patterns across theta evaluations. |
Functions:
| Name | Description |
|---|---|
apply_sqrt_weights | Apply sqrt(weights) transformation to design matrices and response. |
build_lambda_sparse | Build sparse block-diagonal Lambda matrix from theta. |
build_lambda_template | Build a Lambda template for efficient repeated updates. |
compute_agq_deviance | Compute AGQ deviance for Stage 2 optimization. |
compute_irls_quantities | Compute IRLS working weights and working response. |
compute_pls_invariants | Pre-compute quantities that are constant during optimization. |
fit_glm_irls | Fit GLM using IRLS algorithm. |
fit_glmm_pirls | Fit GLMM using PIRLS with outer optimization over theta. |
glmm_deviance | Compute GLMM deviance via Laplace approximation. |
glmm_deviance_objective | Compute GLMM deviance for outer optimization. |
lmm_deviance_sparse | Compute LMM deviance for optimization. |
optimize_theta | Optimize theta using BOBYQA via NLOPT. |
solve_pls_sparse | Solve Penalized Least Squares system using Schur complement. |
solve_weighted_pls_sparse | Solve weighted Penalized Least Squares for GLMM. |
theta_to_cholesky_block | Convert theta vector to lower-triangular Cholesky block. |
update_lambda_from_template | Update Lambda matrix from template using new theta values. |
Modules:
| Name | Description |
|---|---|
glm | GLM IRLS fitting algorithm. |
glmer | PIRLS (Penalized Iteratively Reweighted Least Squares) algorithm for GLMMs. |
heuristics | Heuristics and numerical utilities for GLMM fitting. |
initialization | Moment-based initialization for LMM variance components. |
lambda_builder | Lambda matrix construction from theta parameters. |
lambda_sparse | Sparse Lambda matrix construction and theta-Cholesky conversions. |
lambda_template | Lambda matrix templates and sparsity pattern preservation. |
lmer | Core LMM operations: PLS solving and deviance computation. |
optimize | BOBYQA optimizer via NLOPT for variance component optimization. |
pirls_sparse | Sparse PIRLS (Penalized Iteratively Reweighted Least Squares) core routines. |
quadrature | Adaptive Gauss-Hermite Quadrature (AGQ) for GLMM marginal likelihood. |
Classes¶
LambdaTemplate¶
LambdaTemplate(template: sp.csc_matrix, theta_to_data_map: list[np.ndarray], re_structure: str, n_groups_list: list[int]) -> NoneTemplate for efficient Lambda matrix updates during optimization.
Caches the sparsity pattern of Lambda so that repeated calls only update the data values, avoiding reconstruction of indices.
Attributes:
| Name | Type | Description |
|---|---|---|
template | csc_matrix | Sparse CSC matrix with correct structure but placeholder values. |
theta_to_data_map | list[ndarray] | List mapping theta indices to positions in template.data. Each element is an int64 array of indices into template.data. |
re_structure | str | The random effects structure type. |
n_groups_list | list[int] | Number of groups per factor. |
Attributes¶
n_groups_list¶
n_groups_list: list[int]re_structure¶
re_structure: strtemplate¶
template: sp.csc_matrixtheta_to_data_map¶
theta_to_data_map: list[np.ndarray]PLSInvariants¶
PLSInvariants(XtX: np.ndarray, Xty: np.ndarray, X_backend: np.ndarray, y_backend: np.ndarray) -> NonePre-computed quantities that don’t change during theta optimization.
These are computed once when the model is set up and reused for every deviance evaluation during BOBYQA optimization.
Attributes:
| Name | Type | Description |
|---|---|---|
XtX | ndarray | Gram matrix X’X, shape (p, p). Backend array (JAX or NumPy). |
Xty | ndarray | Cross-product X’y, shape (p,). Backend array. |
X_backend | ndarray | Fixed effects design matrix as backend array, shape (n, p). |
y_backend | ndarray | Response vector as backend array, shape (n,). |
Attributes¶
X_backend¶
X_backend: np.ndarrayXtX¶
XtX: np.ndarrayXty¶
Xty: np.ndarrayy_backend¶
y_backend: np.ndarrayPatternTemplate¶
PatternTemplate(ZL_pattern: sp.csc_matrix, S22_pattern: sp.csc_matrix, Z: sp.csc_matrix, lambda_template: 'LambdaTemplate') -> NoneTemplate for preserving sparsity patterns across theta evaluations.
This is critical for cross-theta Cholesky caching. When theta has boundary values (theta=0), the resulting matrices would normally have fewer non-zeros. This template ensures the sparsity pattern remains constant by storing explicit zeros where needed.
Used to match lme4’s behavior where the sparsity pattern is fixed at initialization and never changes during optimization.
Attributes:
| Name | Type | Description |
|---|---|---|
ZL_pattern | csc_matrix | Sparsity pattern of Z @ Lambda (computed with theta=ones). |
S22_pattern | csc_matrix | Sparsity pattern of ZL’ @ ZL + I (computed with theta=ones). |
Z | csc_matrix | Reference to the original Z matrix for value computation. |
lambda_template | ‘LambdaTemplate’ | Reference to the LambdaTemplate. |
Attributes¶
S22_pattern¶
S22_pattern: sp.csc_matrixZ¶
Z: sp.csc_matrixZL_pattern¶
ZL_pattern: sp.csc_matrixlambda_template¶
lambda_template: 'LambdaTemplate'Functions¶
apply_sqrt_weights¶
apply_sqrt_weights(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, weights: np.ndarray | None) -> tuple[np.ndarray, sp.csc_matrix, np.ndarray, np.ndarray | None]Apply sqrt(weights) transformation to design matrices and response.
Transforms the weighted LMM problem into an unweighted problem on transformed data. This is the standard approach from lme4 and MixedModels.jl.
minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||² minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||²
After transformation (X_w, Z_w, y_w = W^{1/2}X, W^{1/2}Z, W^{1/2}y): minimize ||y_w - X_w·β - Z_w·Λu||² + ||u||²
This is the standard unweighted PLS, so the same solver applies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix, shape (n, p). | required |
Z | csc_matrix | Random effects design matrix, sparse csc, shape (n, q). | required |
y | ndarray | Response vector, shape (n,). | required |
weights | ndarray | None | Observation weights, shape (n,), or None for unweighted. | required |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, csc_matrix, ndarray, ndarray | None] | Tuple of (X_w, Z_w, y_w, sqrtwts) where: X_w: Transformed X, shape (n, p) Z_w: Transformed Z, sparse csc, shape (n, q) y_w: Transformed y, shape (n,) sqrtwts: sqrt(weights), shape (n,), or None if weights is None |
build_lambda_sparse¶
build_lambda_sparse(theta: np.ndarray, n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> sp.csc_matrixBuild sparse block-diagonal Lambda matrix from theta.
Lambda is constructed based on the random effects structure:
Intercept: Diagonal with theta[i] repeated
Diagonal: Block-diagonal with diagonal blocks
Slope: Block-diagonal with full Cholesky blocks
Nested/Crossed: Concatenated blocks per grouping factor
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major). | required |
n_groups_list | list[int] | Number of groups per grouping factor. For simple models: [n_groups] For nested/crossed: [n_groups_factor1, n_groups_factor2, ...] | required |
re_structure | str | Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed” | required |
metadata | dict | None | Optional metadata dict with structure information. May contain ‘re_structures_list’ for mixed models. | None |
Returns:
| Type | Description |
|---|---|
csc_matrix | Sparse block-diagonal Lambda matrix, CSC format. |
Examples:
>>> # Random intercept: (1|group)
>>> theta = np.array([1.5])
>>> Lambda = build_lambda_sparse(theta, [10], "intercept")
>>> Lambda.shape
(10, 10)>>> # Correlated slopes: (1+x|group)
>>> theta = np.array([1.5, 0.3, 1.2]) # [L00, L10, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "slope")
>>> Lambda.shape
(36, 36) # 18 groups x 2 RE per group>>> # Uncorrelated slopes: (1+x||group)
>>> theta = np.array([1.5, 1.2]) # [L00, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "diagonal")
>>> Lambda.shape
(36, 36)Notes:
Returns CSC format for compatibility with CHOLMOD
Blocks are arranged per lme4’s conventions
For diagonal structures, uses blocked layout (all intercepts, then slopes)
build_lambda_template¶
build_lambda_template(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> LambdaTemplateBuild a Lambda template for efficient repeated updates.
Creates a sparse matrix template with the correct sparsity pattern, plus a mapping from theta indices to data array positions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_groups_list | list[int] | Number of groups per grouping factor. | required |
re_structure | str | Random effects structure type. | required |
metadata | dict | None | Optional metadata dict with structure information. | None |
Returns:
| Type | Description |
|---|---|
LambdaTemplate | LambdaTemplate with template matrix and theta-to-data mapping. |
Examples:
>>> # Create template for random intercept model
>>> template = build_lambda_template([18], "intercept")
>>> template.template.shape
(18, 18)>>> # Update with new theta values
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda.data[0]
1.5Notes:
Template is built once per model, reused during optimization
theta_to_data_map[i] gives indices in template.data for theta[i]
compute_agq_deviance¶
compute_agq_deviance(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, group_ids: np.ndarray, nAGQ: int, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, eta_init: np.ndarray | None = None, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None) -> floatCompute AGQ deviance for Stage 2 optimization.
This is the objective function for nAGQ > 1. It:
Runs PIRLS with
beta_fixedto get conditional modesu_hatComputes per-group conditional SDs from the S22 diagonal
Evaluates the AGQ integral using Gauss-Hermite quadrature
Returns
-2 * sum_i(log_integral_i)
Only valid for scalar random intercept models (single grouping factor, intercept only).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (scalar for random intercept). | required |
beta | ndarray | Fixed effects coefficients (p,). | required |
X | ndarray | Fixed effects design matrix (n, p). | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse. | required |
y | ndarray | Response vector (n,). | required |
family | Family | GLM family (binomial, poisson). | required |
n_groups_list | list[int] | Number of groups per grouping factor. | required |
re_structure | str | Random effects structure type. | required |
group_ids | ndarray | Group membership array (n,), integer-coded. | required |
nAGQ | int | Number of quadrature points (must be > 1). | required |
metadata | dict | None | Additional RE metadata. | None |
prior_weights | ndarray | None | Prior observation weights. | None |
pirls_max_iter | int | Maximum PIRLS iterations per evaluation. | 30 |
pirls_tol | float | PIRLS convergence tolerance. | 1e-07 |
eta_init | ndarray | None | Initial linear predictor for PIRLS. | None |
lambda_template | ‘LambdaTemplate | None’ | Pre-built template for efficient Lambda updates. | None |
factor_cache | dict | None | Mutable dict for caching Cholesky factors. | None |
Returns:
| Type | Description |
|---|---|
float | AGQ deviance (scalar) = -2 * sum(log marginal likelihood). |
compute_irls_quantities¶
compute_irls_quantities(y: np.ndarray, eta: np.ndarray, family: Family) -> tuple[np.ndarray, np.ndarray]Compute IRLS working weights and working response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values (n,) | required |
eta | ndarray | Linear predictor (n,) | required |
family | Family | GLM family object | required |
Returns:
| Name | Type | Description |
|---|---|---|
working_weights | ndarray | IRLS weights w = 1 / (V(mu) * (deta/dmu)^2) |
working_response | ndarray | Pseudo-response z = eta + (y - mu) * deta/dmu |
compute_pls_invariants¶
compute_pls_invariants(X: np.ndarray, y: np.ndarray) -> PLSInvariantsPre-compute quantities that are constant during optimization.
Call this once before the optimization loop to avoid redundant computation. X’X and X’y don’t depend on theta, so computing them once saves work.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix, shape (n, p). | required |
y | ndarray | Response vector, shape (n,). | required |
Returns:
| Type | Description |
|---|---|
PLSInvariants | PLSInvariants with pre-computed X’X, X’y, and backend arrays. |
Examples:
>>> X = np.random.randn(100, 3)
>>> y = np.random.randn(100)
>>> inv = compute_pls_invariants(X, y)
>>> inv.XtX.shape
(3, 3)fit_glm_irls¶
fit_glm_irls(y: np.ndarray, X: np.ndarray, family: Family, weights: np.ndarray | None = None, max_iter: int = 25, tol: float = 1e-08) -> dictFit GLM using IRLS algorithm.
Implements Iteratively Reweighted Least Squares for generalized linear models. The algorithm iterates until convergence (relative deviance change < tol) or max_iter is reached.
The core fitting loop is JIT-compiled for efficiency. All family-specific operations (link, variance, deviance) are pure JAX functions from the Family object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values (n,) | required |
X | ndarray | Design matrix (n, p) | required |
family | Family | Family object with link, variance, and deviance functions | required |
weights | ndarray | None | Prior observation weights (n,) or None for equal weights | None |
max_iter | int | Maximum IRLS iterations (default: 25) | 25 |
tol | float | Convergence tolerance for relative deviance change (default: 1e-8) | 1e-08 |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with keys: - coef: Coefficient estimates (p,) - vcov: Variance-covariance matrix (p, p) - dispersion: Dispersion parameter estimate - fitted: Fitted values μ on response scale (n,) - linear_predictor: Linear predictor η on link scale (n,) - residuals: Response residuals y - μ (n,) - deviance: Residual deviance - null_deviance: Null model deviance - df_residual: Residual degrees of freedom - loglik: Log-likelihood - converged: True if converged, False otherwise - n_iter: Number of iterations performed - has_separation: Separation detection (binomial only, None otherwise) - irls_weights: Final IRLS weights (n,) |
Notes:
Uses while_loop for early stopping on convergence
Clips IRLS weights to [1e-10, 1e10] for numerical stability
Clips μ values based on family type (binomial: [1e-10, 1-1e-10], poisson: [1e-10, inf])
Adds small regularization (1e-10) to diagonal of XtWX
Separation detection checks if fitted probabilities are extremely close to 0 or 1 (< 1e-6 or > 1-1e-6) for binomial models
fit_glmm_pirls¶
fit_glmm_pirls(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, theta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, max_outer_iter: int = 10000, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, two_stage: bool = True, lambda_template: 'LambdaTemplate | None' = None, nAGQ: int = 1, group_ids: np.ndarray | None = None) -> dictFit GLMM using PIRLS with outer optimization over theta.
Optimization strategy (matching lme4):
Stage 1: BOBYQA optimizes theta only, PIRLS solves for (beta, u) at each theta
Stage 2: Nelder-Mead optimizes [theta, beta] jointly, PIRLS only solves for u
The two-stage approach matches lme4 exactly:
Stage 1 (nAGQ=0): Optimize theta via BOBYQA, with PIRLS finding optimal (beta, u) for each theta.
Stage 2 (nAGQ>=1): Joint optimization of [theta, beta] via Nelder-Mead, where beta is held fixed in PIRLS (added to offset) so PIRLS only solves for u. This allows the optimizer to find the global (theta, beta) optimum.
When nAGQ > 1, Stage 2 uses adaptive Gauss-Hermite quadrature instead of the Laplace approximation for more accurate marginal likelihood.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse | required |
y | ndarray | Response vector (n,) | required |
family | Family | GLM family (binomial, poisson) | required |
n_groups_list | list[int] | Number of groups per grouping factor | required |
re_structure | str | Random effects structure type | required |
metadata | dict | None | Additional RE metadata | None |
theta_init | ndarray | None | Initial theta values (optional) | None |
prior_weights | ndarray | None | Prior observation weights | None |
max_outer_iter | int | Maximum BOBYQA iterations for Stage 1 | 10000 |
pirls_max_iter | int | Maximum PIRLS iterations per theta evaluation | 30 |
pirls_tol | float | PIRLS convergence tolerance (1e-7 matches lme4) | 1e-07 |
verbose | bool | Print optimization progress | False |
two_stage | bool | Use two-stage optimization matching lme4 (default True) | True |
lambda_template | ‘LambdaTemplate | None’ | Optional pre-built template for efficient Lambda updates. If provided, uses this template instead of building a new one. This is useful for bootstrap where the same template can be reused. | None |
nAGQ | int | Number of adaptive Gauss-Hermite quadrature points (default 1). nAGQ=1 uses the Laplace approximation. nAGQ > 1 uses AGQ for more accurate integration (scalar random intercept only). | 1 |
group_ids | ndarray | None | Group membership array (n,), integer-coded. Required when nAGQ > 1 for per-group quadrature. | None |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with: |
dict | - theta: Optimized theta parameters |
dict | - beta: Fixed effects coefficients |
dict | - u: Spherical random effects |
dict | - eta: Linear predictor |
dict | - mu: Fitted values |
dict | - deviance: Final GLMM deviance |
dict | - loglik: Laplace log-likelihood |
dict | - converged: Whether optimizer converged |
dict | - n_outer_iter: Number of outer iterations |
dict | - n_func_evals: Number of deviance evaluations |
dict | - pirls_converged: Whether final PIRLS converged |
dict | - pirls_n_iter: Final PIRLS iterations |
glmm_deviance¶
glmm_deviance(y: np.ndarray, mu: np.ndarray, family: Family, logdet: float, sqrL: float, prior_weights: np.ndarray | None = None) -> floatCompute GLMM deviance via Laplace approximation.
deviance_GLMM = sum(dev_resid) + logdet + ||u||^2 deviance_GLMM = sum(dev_resid) + logdet + ||u||^2
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values (n,) | required |
mu | ndarray | Fitted values mu = g^{-1}(eta) (n,) | required |
family | Family | GLM family | required |
logdet | float | Log-determinant log | L’Z’WZL + I |
sqrL | float | Sum of squared spherical random effects | |
prior_weights | ndarray | None | Prior weights for observations (n,). For binomial with proportions, this is the number of trials. | None |
Returns:
| Type | Description |
|---|---|
float | GLMM deviance (scalar) |
glmm_deviance_objective¶
glmm_deviance_objective(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None, pattern_template: 'PatternTemplate | None' = None, pirls_result_cache: dict | None = None) -> floatCompute GLMM deviance for outer optimization.
This is the objective function for BOBYQA optimization over theta.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major) | required |
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse | required |
y | ndarray | Response vector (n,) | required |
family | Family | GLM family (binomial, poisson) | required |
n_groups_list | list[int] | Number of groups per grouping factor | required |
re_structure | str | Random effects structure type | required |
metadata | dict | None | Additional RE metadata | None |
prior_weights | ndarray | None | Prior observation weights | None |
pirls_max_iter | int | Maximum PIRLS iterations per theta evaluation | 30 |
pirls_tol | float | PIRLS convergence tolerance (1e-7 matches lme4) | 1e-07 |
verbose | bool | Print verbose output | False |
lambda_template | ‘LambdaTemplate | None’ | Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call (~10-15% speedup). | None |
factor_cache | dict | None | Optional mutable dict for caching Cholesky factors across theta evaluations. Should contain keys ‘factor_S22’ and ‘factor_S’. Pass an empty dict {} to enable caching; factors will be stored and reused across calls, avoiding repeated symbolic analysis. | None |
pattern_template | ‘PatternTemplate | None’ | PatternTemplate for preserving sparsity patterns. Required for cross-theta factor caching. When provided, ensures S22 has consistent sparsity pattern even when theta hits boundaries (theta=0). | None |
pirls_result_cache | dict | None | Optional mutable dict for caching the last PIRLS result. When provided, stores {"theta": theta_copy, "result": pirls_result} on each call so the caller can reuse the result at the converged theta without a redundant PIRLS evaluation. | None |
Returns:
| Type | Description |
|---|---|
float | GLMM deviance (scalar) - the Laplace approximation of -2*logLik |
lmm_deviance_sparse¶
lmm_deviance_sparse(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, n_groups_list: list[int], re_structure: str, method: str = 'REML', lambda_template: 'LambdaTemplate | None' = None, pls_invariants: PLSInvariants | None = None, metadata: dict | None = None, sqrtwts: np.ndarray | None = None) -> floatCompute LMM deviance for optimization.
The deviance is -2 * log-likelihood, profiled over β and σ².
-2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n -2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n
-2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p) -2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p)
For weighted models, the deviance includes an additional adjustment: -log|W| = -2*sum(log(sqrtwts))
This follows lme4 and MixedModels.jl: X, Z, y should already be pre-transformed by sqrt(weights) using apply_sqrt_weights().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major). | required |
X | ndarray | Fixed effects design matrix, shape (n, p). Should be pre-transformed by sqrt(weights) if weighted. | required |
Z | csc_matrix | Random effects design matrix, shape (n, q), scipy sparse. Should be pre-transformed by sqrt(weights) if weighted. | required |
y | ndarray | Response vector, shape (n,). Should be pre-transformed by sqrt(weights) if weighted. | required |
n_groups_list | list[int] | Number of groups per grouping factor. | required |
re_structure | str | Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed” | required |
method | str | Estimation method, either “ML” or “REML”. | ‘REML’ |
lambda_template | ‘LambdaTemplate | None’ | Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call. | None |
pls_invariants | PLSInvariants | None | Optional pre-computed invariants (X’X, X’y, JAX arrays). If provided, avoids redundant matrix computation in optimization loop. Use compute_pls_invariants() to create this before the loop. | None |
metadata | dict | None | Optional metadata dict with structure information. Required for crossed/nested/mixed structures with non-intercept factors. Should contain ‘re_structures_list’ specifying each factor’s structure. | None |
sqrtwts | ndarray | None | Optional sqrt(weights), shape (n,). If provided, adds the log-determinant adjustment -2*sum(log(sqrtwts)) to the deviance. This compensates for the Jacobian from the weight transformation. | None |
Returns:
| Type | Description |
|---|---|
float | Deviance value (scalar). |
Examples:
>>> import numpy as np
>>> import scipy.sparse as sp
>>> # Simple random intercept model
>>> theta = np.array([1.5])
>>> X = np.random.randn(100, 2)
>>> Z = sp.random(100, 10, density=0.1, format='csc')
>>> y = np.random.randn(100)
>>> dev = lmm_deviance_sparse(theta, X, Z, y, [10], "intercept", "REML")
>>> dev > 0
TrueNotes:
Requires lambda_builder module for building Λ from theta
Returns ML deviance if method=“ML”, REML criterion if method=“REML”
Used as objective function in BOBYQA optimization
Pass lambda_template for ~10-20% speedup in optimization loop
Pass pls_invariants for additional speedup (avoids X’X, X’y computation)
For weighted models, pass sqrtwts and pre-transformed X, Z, y
optimize_theta¶
optimize_theta(objective: Callable, theta0: np.ndarray, lower: np.ndarray, upper: np.ndarray, rhobeg: float = 0.002, rhoend: float = 2e-07, maxfun: int = 10000, verbose: bool = False) -> dictOptimize theta using BOBYQA via NLOPT.
BOBYQA is Powell’s derivative-free trust-region method for bound-constrained optimization. This implementation uses NLOPT’s LN_BOBYQA algorithm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
objective | Callable | Function f(theta) -> scalar deviance to minimize. | required |
theta0 | ndarray | Initial parameter values (array-like). | required |
lower | ndarray | Lower bounds for parameters. | required |
upper | ndarray | Upper bounds for parameters. | required |
rhobeg | float | Initial trust region radius (sets initial step size). Default 2e-3 (lme4’s default of 0.002). | 0.002 |
rhoend | float | Final trust region radius (used to set tolerances). Default 2e-7 (lme4 default). | 2e-07 |
maxfun | int | Maximum function evaluations. Default 10000. | 10000 |
verbose | bool | Print optimization progress. | False |
Returns:
| Type | Description |
|---|---|
dict | dict with keys: - theta: Optimized parameters (np.ndarray) - fun: Final objective value (float) - converged: Whether optimization succeeded (bool) - n_evals: Number of function evaluations (int) - message: Status message (str) |
Examples:
>>> def rosenbrock(theta):
... x, y = theta
... return (1 - x)**2 + 100 * (y - x**2)**2
>>> result = optimize_theta(
... rosenbrock,
... theta0=[0.0, 0.0],
... lower=[-2.0, -2.0],
... upper=[2.0, 2.0]
... )
>>> result['converged']
True
>>> np.allclose(result['theta'], [1.0, 1.0], atol=1e-3)
Truesolve_pls_sparse¶
solve_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, pls_invariants: PLSInvariants | None = None) -> dictSolve Penalized Least Squares system using Schur complement.
Uses lme4’s algorithm: factor only S22 = Λ’Z’ZΛ + I, then use Schur complement to solve for beta. This avoids factoring the full augmented system.
Algorithm (matching lme4/src/predModule.cpp): 1. Factor S22 = Λ’Z’ZΛ + I via CHOLMOD 2. Compute RZX = L^{-1} * Λ’Z’X (forward solve) 3. Compute Schur = X’X - RZX’ * RZX (rank update) 4. Solve for beta via Schur complement 5. Back-solve for u
ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k groups), this uses ~50MB instead of ~2.4GB per iteration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix, shape (n, p). | required |
Z | csc_matrix | Random effects design matrix, shape (n, q), scipy sparse CSC. | required |
Lambda | csc_matrix | Cholesky-like factor, shape (q, q), scipy sparse CSC. | required |
y | ndarray | Response vector, shape (n,). | required |
pls_invariants | PLSInvariants | None | Optional pre-computed invariants (X’X, X’y, backend arrays). If provided, avoids redundant computation in optimization loop. Use compute_pls_invariants() to create this before the loop. | None |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with keys: |
dict | - beta: Fixed effects coefficients, shape (p,). |
dict | - u: Spherical random effects, shape (q,). |
dict | - logdet_L: Log-determinant of (Λ’Z’ZΛ + I). |
dict | - rss: Residual sum of squares |
dict | - logdet_RX: Log-determinant of Schur complement (for REML). |
Examples:
>>> import numpy as np
>>> import scipy.sparse as sp
>>> n, p, q = 100, 3, 10
>>> X = np.random.randn(n, p)
>>> Z = sp.random(n, q, density=0.1, format='csc')
>>> Lambda = sp.eye(q, format='csc')
>>> y = np.random.randn(n)
>>> result = solve_pls_sparse(X, Z, Lambda, y)
>>> result['beta'].shape
(3,)
>>> result['u'].shape
(10,)Notes:
Converts Z and Lambda to CSC format if needed (CHOLMOD requirement)
Uses CHOLMOD for numerically stable sparse Cholesky of S22
Dense linear algebra uses JIT-compiled functions (via _get_dense_ops)
ZΛ stays sparse; cross-products use efficient sparse-dense ops
Returns RSS without penalty term (PWRSS = RSS + ||u||²)
solve_weighted_pls_sparse¶
solve_weighted_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, z: np.ndarray, weights: np.ndarray, beta_fixed: np.ndarray | None = None, ZL: sp.csc_matrix | None = None, ZL_dense: np.ndarray | None = None, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dictSolve weighted Penalized Least Squares for GLMM.
[X’WX X’WZL ] [beta] [X’Wz ] [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I ] [u] = [L’Z’Wz]
using sparse Cholesky factorization.
If beta_fixed is provided, solves only for u with beta held constant: [L’Z’WZL + I] [u] = [L’Z’W(z - X*beta)]
This is critical for Stage 2 optimization where we need to evaluate deviance at arbitrary (theta, beta) points without PIRLS re-optimizing beta.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse CSC | required |
Lambda | csc_matrix | Cholesky-like factor (q, q), scipy sparse CSC | required |
z | ndarray | Working response (n,) | required |
weights | ndarray | IRLS weights (n,) | required |
beta_fixed | ndarray | None | If provided, solve only for u with beta fixed to this value. | None |
ZL | csc_matrix | None | Pre-computed Z @ Lambda (sparse). If provided, avoids redundant computation when called repeatedly with same Z and Lambda. | None |
ZL_dense | ndarray | None | Pre-computed ZL.toarray() (dense). If provided, avoids repeated sparse-to-dense materialization on each call. | None |
factor_S22 | ‘SparseFactorization | None’ | Cached Cholesky factor for S22 block. If provided, uses cholesky_inplace() to update with new values (same sparsity pattern). This avoids expensive symbolic analysis on repeated calls. | None |
factor_S | ‘SparseFactorization | None’ | Cached Cholesky factor for full augmented system S. Only used in full solve mode (when beta_fixed is None). | None |
pattern_template | ‘PatternTemplate | None’ | PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. Required when using factor_S22 with cross-theta caching. | None |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with keys: |
dict | - beta: Fixed effects coefficients (p,) |
dict | - u: Spherical random effects (q,) |
dict | - logdet_L: Log-determinant of (L’Z’WZL + I) |
dict | - fitted_z: Fitted working response Xbeta + ZL*u |
dict | - factor_S22: Cholesky factor for S22 (for caching) |
dict | - factor_S: Cholesky factor for full system S (for caching, full mode only) |
theta_to_cholesky_block¶
theta_to_cholesky_block(theta: np.ndarray, dim: int) -> np.ndarrayConvert theta vector to lower-triangular Cholesky block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major). For 2x2: [L00, L10, L11], length = 3 For 3x3: [L00, L10, L11, L20, L21, L22], length = 6 | required |
dim | int | Dimension of the Cholesky block (2, 3, etc.). | required |
Returns:
| Type | Description |
|---|---|
ndarray | Lower triangular Cholesky block, shape (dim, dim). |
Examples:
>>> theta = np.array([1.5, 0.3, 1.2])
>>> L = theta_to_cholesky_block(theta, 2)
>>> L
array([[1.5, 0. ],
[0.3, 1.2]])Notes:
Fills lower triangle column-by-column (Fortran order)
Diagonal elements should be positive for valid Cholesky
update_lambda_from_template¶
update_lambda_from_template(template: LambdaTemplate, theta: np.ndarray) -> sp.csc_matrixUpdate Lambda matrix from template using new theta values.
Efficiently updates only the data values without rebuilding structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template | LambdaTemplate | LambdaTemplate created by build_lambda_template. | required |
theta | ndarray | New theta values. | required |
Returns:
| Type | Description |
|---|---|
csc_matrix | Updated sparse Lambda matrix in CSC format. |
Examples:
>>> template = build_lambda_template([18], "intercept")
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda[0, 0]
1.5Notes:
Much faster than build_lambda_sparse for repeated calls
Modifies template.data in place for maximum efficiency
Modules¶
glm¶
GLM IRLS fitting algorithm.
This module implements the Iteratively Reweighted Least Squares (IRLS) algorithm for GLM fitting. The algorithm supports both JAX and NumPy backends:
JAX backend: Uses lax.while_loop for JIT-compiled, efficient iteration
NumPy backend: Uses regular Python loops for Pyodide compatibility
All functions are designed to work with Family objects from
Functions:
| Name | Description |
|---|---|
compute_loglik | Compute log-likelihood for a fitted GLM. |
compute_null_deviance | Compute null deviance for the intercept-only model. |
fit_glm_irls | Fit GLM using IRLS algorithm. |
Classes¶
Functions¶
compute_loglik¶
compute_loglik(y: np.ndarray, mu: np.ndarray, deviance: float | np.ndarray, dispersion: float | np.ndarray, family: Family, n: int, xp: Any) -> np.ndarray | floatCompute log-likelihood for a fitted GLM.
Uses family-specific formulas for exact log-likelihood computation. Falls back to the deviance-based approximation for unknown families.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values, shape (n,). | required |
mu | ndarray | Fitted values on response scale, shape (n,). | required |
deviance | float | ndarray | Residual deviance (scalar). | required |
dispersion | float | ndarray | Dispersion parameter estimate (scalar). | required |
family | Family | Family object with name attribute. | required |
n | int | Number of observations. | required |
xp | Any | Array module (numpy or jax.numpy). | required |
Returns:
| Type | Description |
|---|---|
ndarray | float | Log-likelihood as a scalar. |
compute_null_deviance¶
compute_null_deviance(y: np.ndarray, family: Family, xp: Any) -> np.ndarray | floatCompute null deviance for the intercept-only model.
The null deviance measures how well the response is predicted by a model with only an intercept (the mean). It is used as a baseline for assessing model fit improvement.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values, shape (n,). | required |
family | Family | Family object with deviance function. | required |
xp | Any | Array module (numpy or jax.numpy). | required |
Returns:
| Type | Description |
|---|---|
ndarray | float | Null deviance as a scalar. |
fit_glm_irls¶
fit_glm_irls(y: np.ndarray, X: np.ndarray, family: Family, weights: np.ndarray | None = None, max_iter: int = 25, tol: float = 1e-08) -> dictFit GLM using IRLS algorithm.
Implements Iteratively Reweighted Least Squares for generalized linear models. The algorithm iterates until convergence (relative deviance change < tol) or max_iter is reached.
The core fitting loop is JIT-compiled for efficiency. All family-specific operations (link, variance, deviance) are pure JAX functions from the Family object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values (n,) | required |
X | ndarray | Design matrix (n, p) | required |
family | Family | Family object with link, variance, and deviance functions | required |
weights | ndarray | None | Prior observation weights (n,) or None for equal weights | None |
max_iter | int | Maximum IRLS iterations (default: 25) | 25 |
tol | float | Convergence tolerance for relative deviance change (default: 1e-8) | 1e-08 |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with keys: - coef: Coefficient estimates (p,) - vcov: Variance-covariance matrix (p, p) - dispersion: Dispersion parameter estimate - fitted: Fitted values μ on response scale (n,) - linear_predictor: Linear predictor η on link scale (n,) - residuals: Response residuals y - μ (n,) - deviance: Residual deviance - null_deviance: Null model deviance - df_residual: Residual degrees of freedom - loglik: Log-likelihood - converged: True if converged, False otherwise - n_iter: Number of iterations performed - has_separation: Separation detection (binomial only, None otherwise) - irls_weights: Final IRLS weights (n,) |
Notes:
Uses while_loop for early stopping on convergence
Clips IRLS weights to [1e-10, 1e10] for numerical stability
Clips μ values based on family type (binomial: [1e-10, 1-1e-10], poisson: [1e-10, inf])
Adds small regularization (1e-10) to diagonal of XtWX
Separation detection checks if fitted probabilities are extremely close to 0 or 1 (< 1e-6 or > 1-1e-6) for binomial models
glmer¶
PIRLS (Penalized Iteratively Reweighted Least Squares) algorithm for GLMMs.
This module implements the PIRLS algorithm for fitting generalized linear mixed models using Laplace approximation. The algorithm combines IRLS (from GLMs) with penalized least squares (from LMMs).
pirls_sparse.py: Core PIRLS inner-loop routines (IRLS quantities,pirls_sparse.py: Core PIRLS inner-loop routines (IRLS quantities, weighted PLS, deviance, log-likelihood, PIRLS iteration).heuristics.py: Initialization heuristics, bounds, finite-difference derivatives, Hessian-based vcov.glmer.py(this file): Orchestrator — deviance objectives and the top-levelfit_glmm_pirlsdriver.
PIRLS Algorithm:¶
For fixed variance components theta: 1. Initialize eta from GLM fit 2. Until convergence: a. Compute working weights: w = 1 / (V(mu) * (deta/dmu)^2) b. Compute working response: z = eta + (y - mu) * deta/dmu c. Solve weighted penalized least squares: [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I] [u] = [L’Z’Wz] d. Update: eta = Xbeta + ZL*u e. Check convergence on |delta_eta|
GLMM Deviance (Laplace approximation):¶
-2 log L ~ sum(dev_resid) + log|L'Z'WZL + I| + ||u||^2sum(dev_resid) = sum of GLM deviance residuals
sum(dev_resid) = sum of GLM deviance residuals
log|L’Z’WZL + I| = log-determinant from Cholesky
||u||^2 = penalty term for spherical random effects
Reference:¶
lme4 external.cpp: - pwrssUpdate (lines 308-375) MixedModels.jl generalizedlinearmixedmodel.jl: - pirls! function (lines 600-655) - deviance function (lines 84-109)
Functions:
| Name | Description |
|---|---|
compute_glmm_loglik | Compute GLMM log-likelihood via Laplace approximation. |
compute_hessian_vcov | Compute Hessian-based vcov for fixed effects. |
compute_irls_quantities | Compute IRLS working weights and working response. |
deriv12 | Compute gradient and Hessian using central finite differences. |
fit_glmm_pirls | Fit GLMM using PIRLS with outer optimization over theta. |
get_theta_lower_bounds | Get lower bounds for theta parameters. |
glmm_deviance | Compute GLMM deviance via Laplace approximation. |
glmm_deviance_at_fixed_theta_beta | Compute GLMM deviance at fixed (theta, beta) by solving only for u. |
glmm_deviance_objective | Compute GLMM deviance for outer optimization. |
init_theta_glmm | Initialize theta for GLMM fitting. |
pirls_sparse | Penalized Iteratively Reweighted Least Squares using sparse operations. |
solve_weighted_pls_sparse | Solve weighted Penalized Least Squares for GLMM. |
Classes¶
Functions¶
compute_glmm_loglik¶
compute_glmm_loglik(y: np.ndarray, mu: np.ndarray, family: Family, sqrL: float, logdet: float, prior_weights: np.ndarray | None = None) -> floatCompute GLMM log-likelihood via Laplace approximation.
Matches lme4’s logLik() for GLMMs: loglik = cond_loglik - 0.5 * (||u||^2 + logdet)
where cond_loglik is the full conditional log-likelihood including normalizing constants (log(y!), log(choose(m,y)), etc.).
For weighted binomial (grouped), uses the full binomial log-likelihood with log(choose(m, y_counts)) to match lme4’s aic() function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values (n,). For weighted binomial, these are proportions. | required |
mu | ndarray | Fitted values (n,). | required |
family | Family | GLM family with loglik method. | required |
sqrL | float | Sum of squared spherical random effects | |
logdet | float | Log-determinant log | L’Z’WZL + I |
prior_weights | ndarray | None | Prior weights (n,). For binomial proportions, the number of trials per observation. | None |
Returns:
| Type | Description |
|---|---|
float | Log-likelihood (scalar). |
compute_hessian_vcov¶
compute_hessian_vcov(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, delta: float = 0.0001, eta_init: np.ndarray | None = None) -> tuple[np.ndarray, dict]Compute Hessian-based vcov for fixed effects.
Matches lme4’s use.hessian=TRUE approach:
Compute numerical Hessian of deviance w.r.t. [theta, beta]
Invert the Hessian
Extract beta-beta block
Symmetrize
This uses lme4’s deriv12() approach (simple central differences with delta=1e-4) for exact parity with R.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Final theta (variance component) estimates | required |
beta | ndarray | Final beta (fixed effect) estimates | required |
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse | required |
y | ndarray | Response vector (n,) | required |
family | Family | GLM family (binomial, poisson) | required |
n_groups_list | list[int] | Number of groups per grouping factor | required |
re_structure | str | Random effects structure type | required |
metadata | dict | None | Additional RE metadata | None |
prior_weights | ndarray | None | Prior observation weights | None |
pirls_max_iter | int | Maximum PIRLS iterations per deviance evaluation | 30 |
pirls_tol | float | PIRLS convergence tolerance | 1e-07 |
delta | float | Finite difference step size (default 1e-4, matches lme4) | 0.0001 |
eta_init | ndarray | None | Initial linear predictor for PIRLS. If provided, used as starting point for all deviance evaluations. This provides numerical stability for models with large random effects. | None |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, dict] | Tuple of: - vcov_beta: Variance-covariance matrix for beta, shape (p, p) - derivs: Dictionary with “gradient” and “Hessian” (for diagnostics) |
Notes: If the Hessian is not positive definite, this function returns the pseudo-inverse of the beta-beta block, with a warning.
compute_irls_quantities¶
compute_irls_quantities(y: np.ndarray, eta: np.ndarray, family: Family) -> tuple[np.ndarray, np.ndarray]Compute IRLS working weights and working response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values (n,) | required |
eta | ndarray | Linear predictor (n,) | required |
family | Family | GLM family object | required |
Returns:
| Name | Type | Description |
|---|---|---|
working_weights | ndarray | IRLS weights w = 1 / (V(mu) * (deta/dmu)^2) |
working_response | ndarray | Pseudo-response z = eta + (y - mu) * deta/dmu |
deriv12¶
deriv12(func: Callable[[np.ndarray], float], x: np.ndarray, delta: float = 0.0001, fx: float | None = None) -> dictCompute gradient and Hessian using central finite differences.
This matches lme4’s deriv12() function exactly (lme4/R/deriv.R). Uses simple central differences with step size delta.
The Hessian is computed simultaneously with the gradient for efficiency:
Diagonal: H[j,j] = (f(x+d) - 2f(x) + f(x-d)) / d^2
Off-diag: H[i,j] = (f(x+di+dj) - f(x+di-dj) - f(x-di+dj) + f(x-di-dj)) / (4d^2)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func | Callable[[ndarray], float] | Scalar function R^n -> R to differentiate. | required |
x | ndarray | Point at which to compute derivatives, shape (n,). | required |
delta | float | Step size for finite differences (default 1e-4, matches lme4). | 0.0001 |
fx | float | None | Optional pre-computed f(x) to avoid redundant evaluation. | None |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with: - “gradient”: First derivative vector, shape (n,) - “Hessian”: Second derivative matrix, shape (n, n) |
Note: This is O(h^2) accurate, matching lme4’s approach. For higher accuracy (O(h^8)), use compute_hessian_richardson from
fit_glmm_pirls¶
fit_glmm_pirls(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, theta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, max_outer_iter: int = 10000, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, two_stage: bool = True, lambda_template: 'LambdaTemplate | None' = None, nAGQ: int = 1, group_ids: np.ndarray | None = None) -> dictFit GLMM using PIRLS with outer optimization over theta.
Optimization strategy (matching lme4):
Stage 1: BOBYQA optimizes theta only, PIRLS solves for (beta, u) at each theta
Stage 2: Nelder-Mead optimizes [theta, beta] jointly, PIRLS only solves for u
The two-stage approach matches lme4 exactly:
Stage 1 (nAGQ=0): Optimize theta via BOBYQA, with PIRLS finding optimal (beta, u) for each theta.
Stage 2 (nAGQ>=1): Joint optimization of [theta, beta] via Nelder-Mead, where beta is held fixed in PIRLS (added to offset) so PIRLS only solves for u. This allows the optimizer to find the global (theta, beta) optimum.
When nAGQ > 1, Stage 2 uses adaptive Gauss-Hermite quadrature instead of the Laplace approximation for more accurate marginal likelihood.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse | required |
y | ndarray | Response vector (n,) | required |
family | Family | GLM family (binomial, poisson) | required |
n_groups_list | list[int] | Number of groups per grouping factor | required |
re_structure | str | Random effects structure type | required |
metadata | dict | None | Additional RE metadata | None |
theta_init | ndarray | None | Initial theta values (optional) | None |
prior_weights | ndarray | None | Prior observation weights | None |
max_outer_iter | int | Maximum BOBYQA iterations for Stage 1 | 10000 |
pirls_max_iter | int | Maximum PIRLS iterations per theta evaluation | 30 |
pirls_tol | float | PIRLS convergence tolerance (1e-7 matches lme4) | 1e-07 |
verbose | bool | Print optimization progress | False |
two_stage | bool | Use two-stage optimization matching lme4 (default True) | True |
lambda_template | ‘LambdaTemplate | None’ | Optional pre-built template for efficient Lambda updates. If provided, uses this template instead of building a new one. This is useful for bootstrap where the same template can be reused. | None |
nAGQ | int | Number of adaptive Gauss-Hermite quadrature points (default 1). nAGQ=1 uses the Laplace approximation. nAGQ > 1 uses AGQ for more accurate integration (scalar random intercept only). | 1 |
group_ids | ndarray | None | Group membership array (n,), integer-coded. Required when nAGQ > 1 for per-group quadrature. | None |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with: |
dict | - theta: Optimized theta parameters |
dict | - beta: Fixed effects coefficients |
dict | - u: Spherical random effects |
dict | - eta: Linear predictor |
dict | - mu: Fitted values |
dict | - deviance: Final GLMM deviance |
dict | - loglik: Laplace log-likelihood |
dict | - converged: Whether optimizer converged |
dict | - n_outer_iter: Number of outer iterations |
dict | - n_func_evals: Number of deviance evaluations |
dict | - pirls_converged: Whether final PIRLS converged |
dict | - pirls_n_iter: Final PIRLS iterations |
get_theta_lower_bounds¶
get_theta_lower_bounds(n_theta: int, re_structure: str, metadata: dict | None = None) -> list[float]Get lower bounds for theta parameters.
Diagonal elements of Cholesky factor must be non-negative. Off-diagonal elements are unbounded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_theta | int | Number of theta parameters | required |
re_structure | str | Random effects structure type | required |
metadata | dict | None | Optional metadata dict with ‘re_structures_list’ for crossed/nested/mixed structures | None |
Returns:
| Type | Description |
|---|---|
list[float] | List of lower bounds |
glmm_deviance¶
glmm_deviance(y: np.ndarray, mu: np.ndarray, family: Family, logdet: float, sqrL: float, prior_weights: np.ndarray | None = None) -> floatCompute GLMM deviance via Laplace approximation.
deviance_GLMM = sum(dev_resid) + logdet + ||u||^2 deviance_GLMM = sum(dev_resid) + logdet + ||u||^2
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values (n,) | required |
mu | ndarray | Fitted values mu = g^{-1}(eta) (n,) | required |
family | Family | GLM family | required |
logdet | float | Log-determinant log | L’Z’WZL + I |
sqrL | float | Sum of squared spherical random effects | |
prior_weights | ndarray | None | Prior weights for observations (n,). For binomial with proportions, this is the number of trials. | None |
Returns:
| Type | Description |
|---|---|
float | GLMM deviance (scalar) |
glmm_deviance_at_fixed_theta_beta¶
glmm_deviance_at_fixed_theta_beta(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, eta_init: np.ndarray | None = None, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None, pattern_template: 'PatternTemplate | None' = None) -> floatCompute GLMM deviance at fixed (theta, beta) by solving only for u.
This is the Stage 2 objective function for joint (theta, beta) optimization. Unlike glmm_deviance_objective which optimizes beta via PIRLS, this function holds beta fixed and only solves for the random effects u.
This matches lme4’s approach in Stage 2 where beta is added to the offset before PIRLS runs, so PIRLS only solves for u.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major) | required |
beta | ndarray | Fixed effects coefficients (p,) - held constant | required |
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse | required |
y | ndarray | Response vector (n,) | required |
family | Family | GLM family (binomial, poisson) | required |
n_groups_list | list[int] | Number of groups per grouping factor | required |
re_structure | str | Random effects structure type | required |
metadata | dict | None | Additional RE metadata | None |
prior_weights | ndarray | None | Prior observation weights | None |
pirls_max_iter | int | Maximum PIRLS iterations per theta evaluation | 30 |
pirls_tol | float | PIRLS convergence tolerance (1e-7 matches lme4) | 1e-07 |
verbose | bool | Print verbose output | False |
eta_init | ndarray | None | Initial linear predictor for PIRLS. If provided, used as starting point instead of X @ beta. This matches lme4’s lp0 mechanism where the full linear predictor from Stage 1 (Xbeta + ZLambda*u) is saved and restored for Stage 2. | None |
lambda_template | ‘LambdaTemplate | None’ | Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call (~10-15% speedup). | None |
factor_cache | dict | None | Optional mutable dict for caching Cholesky factors across theta evaluations. Pass an empty dict {} to enable caching. Caches factor_S22 and factor_S to avoid expensive Cholesky symbolic analysis on each evaluation. | None |
pattern_template | ‘PatternTemplate | None’ | PatternTemplate for preserving sparsity patterns. Required for cross-theta factor caching. When provided, ensures S22 has consistent sparsity pattern even when theta hits boundaries (theta=0). | None |
Returns:
| Type | Description |
|---|---|
float | GLMM deviance (scalar) at the exact (theta, beta) point |
glmm_deviance_objective¶
glmm_deviance_objective(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None, pattern_template: 'PatternTemplate | None' = None, pirls_result_cache: dict | None = None) -> floatCompute GLMM deviance for outer optimization.
This is the objective function for BOBYQA optimization over theta.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major) | required |
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse | required |
y | ndarray | Response vector (n,) | required |
family | Family | GLM family (binomial, poisson) | required |
n_groups_list | list[int] | Number of groups per grouping factor | required |
re_structure | str | Random effects structure type | required |
metadata | dict | None | Additional RE metadata | None |
prior_weights | ndarray | None | Prior observation weights | None |
pirls_max_iter | int | Maximum PIRLS iterations per theta evaluation | 30 |
pirls_tol | float | PIRLS convergence tolerance (1e-7 matches lme4) | 1e-07 |
verbose | bool | Print verbose output | False |
lambda_template | ‘LambdaTemplate | None’ | Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call (~10-15% speedup). | None |
factor_cache | dict | None | Optional mutable dict for caching Cholesky factors across theta evaluations. Should contain keys ‘factor_S22’ and ‘factor_S’. Pass an empty dict {} to enable caching; factors will be stored and reused across calls, avoiding repeated symbolic analysis. | None |
pattern_template | ‘PatternTemplate | None’ | PatternTemplate for preserving sparsity patterns. Required for cross-theta factor caching. When provided, ensures S22 has consistent sparsity pattern even when theta hits boundaries (theta=0). | None |
pirls_result_cache | dict | None | Optional mutable dict for caching the last PIRLS result. When provided, stores {"theta": theta_copy, "result": pirls_result} on each call so the caller can reuse the result at the converged theta without a redundant PIRLS evaluation. | None |
Returns:
| Type | Description |
|---|---|
float | GLMM deviance (scalar) - the Laplace approximation of -2*logLik |
init_theta_glmm¶
init_theta_glmm(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> np.ndarrayInitialize theta for GLMM fitting.
Uses conservative defaults. Unlike LMMs, GLMMs work on the link scale so moment-based initialization is less useful.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_groups_list | list[int] | Number of groups per grouping factor. | required |
re_structure | str | Random effects structure type. | required |
metadata | dict | None | Optional metadata dict containing ‘random_names’ or ‘re_terms’ to determine correct number of theta parameters. | None |
Returns:
| Type | Description |
|---|---|
ndarray | Initial theta values. |
pirls_sparse¶
pirls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, family: Family, eta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, beta_fixed: np.ndarray | None = None, max_iter: int = 30, tol: float = 1e-07, verbose: bool = False, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dictPenalized Iteratively Reweighted Least Squares using sparse operations.
For fixed variance parameters encoded in Lambda, iteratively solves for beta and u using weighted penalized least squares. This is the inner loop of GLMM optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse | required |
Lambda | csc_matrix | Cholesky-like factor from theta (q, q), scipy sparse | required |
y | ndarray | Response vector (n,) | required |
family | Family | GLM family (binomial, poisson) | required |
eta_init | ndarray | None | Initial linear predictor (optional) | None |
prior_weights | ndarray | None | Prior observation weights (n,). For binomial with proportions, this should be the number of trials. | None |
beta_fixed | ndarray | None | If provided, solve only for u with beta held fixed. This is used in Stage 2 optimization to evaluate deviance at arbitrary (theta, beta) points without re-optimizing beta. | None |
max_iter | int | Maximum PIRLS iterations | 30 |
tol | float | Convergence tolerance on relative deviance change (1e-7 matches lme4) | 1e-07 |
verbose | bool | Print iteration info | False |
factor_S22 | ‘SparseFactorization | None’ | Cached Cholesky factor for S22 block (L’Z’WZL + I). If provided, reuses symbolic analysis across calls. Useful for caching across theta evaluations in the optimizer. | None |
factor_S | ‘SparseFactorization | None’ | Cached Cholesky factor for full augmented system. Only used when beta_fixed is None (full solve mode). | None |
pattern_template | ‘PatternTemplate | None’ | PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. This is required for cross-theta factor caching to work correctly when theta has boundary values (theta=0). | None |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with keys: |
dict | - beta: Fixed effects coefficients (p,) |
dict | - u: Spherical random effects (q,) |
dict | - eta: Final linear predictor (n,) |
dict | - mu: Final fitted values (n,) |
dict | - deviance: GLMM deviance at convergence |
dict | - converged: Whether PIRLS converged |
dict | - n_iter: Number of iterations |
dict | - logdet_L: Log-determinant |
dict | - factor_S22: Cholesky factor for S22 (for caching) |
dict | - factor_S: Cholesky factor for full system (for caching, full mode only) |
solve_weighted_pls_sparse¶
solve_weighted_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, z: np.ndarray, weights: np.ndarray, beta_fixed: np.ndarray | None = None, ZL: sp.csc_matrix | None = None, ZL_dense: np.ndarray | None = None, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dictSolve weighted Penalized Least Squares for GLMM.
[X’WX X’WZL ] [beta] [X’Wz ] [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I ] [u] = [L’Z’Wz]
using sparse Cholesky factorization.
If beta_fixed is provided, solves only for u with beta held constant: [L’Z’WZL + I] [u] = [L’Z’W(z - X*beta)]
This is critical for Stage 2 optimization where we need to evaluate deviance at arbitrary (theta, beta) points without PIRLS re-optimizing beta.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse CSC | required |
Lambda | csc_matrix | Cholesky-like factor (q, q), scipy sparse CSC | required |
z | ndarray | Working response (n,) | required |
weights | ndarray | IRLS weights (n,) | required |
beta_fixed | ndarray | None | If provided, solve only for u with beta fixed to this value. | None |
ZL | csc_matrix | None | Pre-computed Z @ Lambda (sparse). If provided, avoids redundant computation when called repeatedly with same Z and Lambda. | None |
ZL_dense | ndarray | None | Pre-computed ZL.toarray() (dense). If provided, avoids repeated sparse-to-dense materialization on each call. | None |
factor_S22 | ‘SparseFactorization | None’ | Cached Cholesky factor for S22 block. If provided, uses cholesky_inplace() to update with new values (same sparsity pattern). This avoids expensive symbolic analysis on repeated calls. | None |
factor_S | ‘SparseFactorization | None’ | Cached Cholesky factor for full augmented system S. Only used in full solve mode (when beta_fixed is None). | None |
pattern_template | ‘PatternTemplate | None’ | PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. Required when using factor_S22 with cross-theta caching. | None |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with keys: |
dict | - beta: Fixed effects coefficients (p,) |
dict | - u: Spherical random effects (q,) |
dict | - logdet_L: Log-determinant of (L’Z’WZL + I) |
dict | - fitted_z: Fitted working response Xbeta + ZL*u |
dict | - factor_S22: Cholesky factor for S22 (for caching) |
dict | - factor_S: Cholesky factor for full system S (for caching, full mode only) |
heuristics¶
Heuristics and numerical utilities for GLMM fitting.
This module contains initialization heuristics, bound computation, and
finite-difference derivative routines used by the GLMM fitter in glmer.py.
Classes¶
Functions¶
compute_hessian_vcov¶
compute_hessian_vcov(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, delta: float = 0.0001, eta_init: np.ndarray | None = None) -> tuple[np.ndarray, dict]Compute Hessian-based vcov for fixed effects.
Matches lme4’s use.hessian=TRUE approach:
Compute numerical Hessian of deviance w.r.t. [theta, beta]
Invert the Hessian
Extract beta-beta block
Symmetrize
This uses lme4’s deriv12() approach (simple central differences with delta=1e-4) for exact parity with R.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Final theta (variance component) estimates | required |
beta | ndarray | Final beta (fixed effect) estimates | required |
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse | required |
y | ndarray | Response vector (n,) | required |
family | Family | GLM family (binomial, poisson) | required |
n_groups_list | list[int] | Number of groups per grouping factor | required |
re_structure | str | Random effects structure type | required |
metadata | dict | None | Additional RE metadata | None |
prior_weights | ndarray | None | Prior observation weights | None |
pirls_max_iter | int | Maximum PIRLS iterations per deviance evaluation | 30 |
pirls_tol | float | PIRLS convergence tolerance | 1e-07 |
delta | float | Finite difference step size (default 1e-4, matches lme4) | 0.0001 |
eta_init | ndarray | None | Initial linear predictor for PIRLS. If provided, used as starting point for all deviance evaluations. This provides numerical stability for models with large random effects. | None |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, dict] | Tuple of: - vcov_beta: Variance-covariance matrix for beta, shape (p, p) - derivs: Dictionary with “gradient” and “Hessian” (for diagnostics) |
Notes: If the Hessian is not positive definite, this function returns the pseudo-inverse of the beta-beta block, with a warning.
deriv12¶
deriv12(func: Callable[[np.ndarray], float], x: np.ndarray, delta: float = 0.0001, fx: float | None = None) -> dictCompute gradient and Hessian using central finite differences.
This matches lme4’s deriv12() function exactly (lme4/R/deriv.R). Uses simple central differences with step size delta.
The Hessian is computed simultaneously with the gradient for efficiency:
Diagonal: H[j,j] = (f(x+d) - 2f(x) + f(x-d)) / d^2
Off-diag: H[i,j] = (f(x+di+dj) - f(x+di-dj) - f(x-di+dj) + f(x-di-dj)) / (4d^2)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func | Callable[[ndarray], float] | Scalar function R^n -> R to differentiate. | required |
x | ndarray | Point at which to compute derivatives, shape (n,). | required |
delta | float | Step size for finite differences (default 1e-4, matches lme4). | 0.0001 |
fx | float | None | Optional pre-computed f(x) to avoid redundant evaluation. | None |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with: - “gradient”: First derivative vector, shape (n,) - “Hessian”: Second derivative matrix, shape (n, n) |
Note: This is O(h^2) accurate, matching lme4’s approach. For higher accuracy (O(h^8)), use compute_hessian_richardson from
get_theta_lower_bounds¶
get_theta_lower_bounds(n_theta: int, re_structure: str, metadata: dict | None = None) -> list[float]Get lower bounds for theta parameters.
Diagonal elements of Cholesky factor must be non-negative. Off-diagonal elements are unbounded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_theta | int | Number of theta parameters | required |
re_structure | str | Random effects structure type | required |
metadata | dict | None | Optional metadata dict with ‘re_structures_list’ for crossed/nested/mixed structures | None |
Returns:
| Type | Description |
|---|---|
list[float] | List of lower bounds |
init_theta_glmm¶
init_theta_glmm(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> np.ndarrayInitialize theta for GLMM fitting.
Uses conservative defaults. Unlike LMMs, GLMMs work on the link scale so moment-based initialization is less useful.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_groups_list | list[int] | Number of groups per grouping factor. | required |
re_structure | str | Random effects structure type. | required |
metadata | dict | None | Optional metadata dict containing ‘random_names’ or ‘re_terms’ to determine correct number of theta parameters. | None |
Returns:
| Type | Description |
|---|---|
ndarray | Initial theta values. |
initialization¶
Moment-based initialization for LMM variance components.
Functions:
| Name | Description |
|---|---|
compute_moment_init | Compute moment-based starting values for theta. |
Functions¶
compute_moment_init¶
compute_moment_init(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, group_ids_list: list[np.ndarray], n_groups_list: list[int], re_structure: str, X_re: np.ndarray | None = None, metadata: dict | None = None) -> np.ndarrayCompute moment-based starting values for theta.
Dispatches to appropriate initializer based on re_structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix, shape (n, p). | required |
Z | csc_matrix | Random effects design matrix, shape (n, q), scipy sparse. | required |
y | ndarray | Response vector, shape (n,). | required |
group_ids_list | list[ndarray] | List of group ID arrays (one per grouping factor). For simple models: single array of shape (n,) with values 0 to n_groups-1 For crossed/nested: multiple arrays | required |
n_groups_list | list[int] | Number of groups per grouping factor. | required |
re_structure | str | Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “crossed”, “nested”, “mixed” | required |
X_re | ndarray | None | Random effects design matrix, shape (n, r). For slopes: typically (n, 2) with columns [ones, predictor] Not needed for intercept-only models. | None |
metadata | dict | None | Optional metadata dict with structure information. For crossed/nested/mixed, should contain ‘re_structures_list’ specifying each factor’s structure (e.g., [‘slope’, ‘intercept’]). | None |
Returns:
| Type | Description |
|---|---|
ndarray | Initial theta values, shape varies by structure: |
ndarray | - Intercept: (1,) or (n_factors,) for crossed/nested |
ndarray | - Slope: (3,) for 2x2 Cholesky |
ndarray | - Diagonal: (k,) for k RE per group |
ndarray | - Crossed/nested with mixed: concatenation per factor |
Examples:
>>> import numpy as np
>>> import scipy.sparse as sp
>>> # Simple random intercept
>>> X = np.random.randn(100, 2)
>>> Z = sp.random(100, 10, density=0.1, format='csc')
>>> y = np.random.randn(100)
>>> group_ids = [np.repeat(np.arange(10), 10)]
>>> theta = compute_moment_init(X, Z, y, group_ids, [10], "intercept")
>>> theta.shape
(1,)Notes:
Uses lme4’s variance decomposition when possible
Falls back to conservative defaults if estimation fails
Returns theta on relative scale (τ/σ)
lambda_builder¶
Lambda matrix construction from theta parameters.
This module is a re-export facade. Implementation lives in:
lambda_template: Template/pattern caching for efficient optimization
lambda_sparse: Sparse matrix construction and theta-Cholesky conversions
All public names are re-exported here for backward compatibility.
Classes:
| Name | Description |
|---|---|
LambdaTemplate | Template for efficient Lambda matrix updates during optimization. |
PatternTemplate | Template for preserving sparsity patterns across theta evaluations. |
Functions:
| Name | Description |
|---|---|
build_lambda_sparse | Build sparse block-diagonal Lambda matrix from theta. |
build_lambda_template | Build a Lambda template for efficient repeated updates. |
build_pattern_template | Build pattern template for cross-theta Cholesky caching. |
cholesky_block_to_theta | Convert lower-triangular Cholesky block to theta vector. |
compute_s22_preserve_pattern | Compute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity. |
compute_zl_preserve_pattern | Compute Z @ Lambda while preserving the sparsity pattern of ZL_pattern. |
theta_to_cholesky_block | Convert theta vector to lower-triangular Cholesky block. |
theta_to_variance_params | Convert theta (Cholesky elements) to variance parameters. |
update_lambda_from_template | Update Lambda matrix from template using new theta values. |
Classes¶
LambdaTemplate¶
LambdaTemplate(template: sp.csc_matrix, theta_to_data_map: list[np.ndarray], re_structure: str, n_groups_list: list[int]) -> NoneTemplate for efficient Lambda matrix updates during optimization.
Caches the sparsity pattern of Lambda so that repeated calls only update the data values, avoiding reconstruction of indices.
Attributes:
| Name | Type | Description |
|---|---|---|
template | csc_matrix | Sparse CSC matrix with correct structure but placeholder values. |
theta_to_data_map | list[ndarray] | List mapping theta indices to positions in template.data. Each element is an int64 array of indices into template.data. |
re_structure | str | The random effects structure type. |
n_groups_list | list[int] | Number of groups per factor. |
Attributes¶
n_groups_list¶
n_groups_list: list[int]re_structure¶
re_structure: strtemplate¶
template: sp.csc_matrixtheta_to_data_map¶
theta_to_data_map: list[np.ndarray]PatternTemplate¶
PatternTemplate(ZL_pattern: sp.csc_matrix, S22_pattern: sp.csc_matrix, Z: sp.csc_matrix, lambda_template: 'LambdaTemplate') -> NoneTemplate for preserving sparsity patterns across theta evaluations.
This is critical for cross-theta Cholesky caching. When theta has boundary values (theta=0), the resulting matrices would normally have fewer non-zeros. This template ensures the sparsity pattern remains constant by storing explicit zeros where needed.
Used to match lme4’s behavior where the sparsity pattern is fixed at initialization and never changes during optimization.
Attributes:
| Name | Type | Description |
|---|---|---|
ZL_pattern | csc_matrix | Sparsity pattern of Z @ Lambda (computed with theta=ones). |
S22_pattern | csc_matrix | Sparsity pattern of ZL’ @ ZL + I (computed with theta=ones). |
Z | csc_matrix | Reference to the original Z matrix for value computation. |
lambda_template | ‘LambdaTemplate’ | Reference to the LambdaTemplate. |
Attributes¶
S22_pattern¶
S22_pattern: sp.csc_matrixZ¶
Z: sp.csc_matrixZL_pattern¶
ZL_pattern: sp.csc_matrixlambda_template¶
lambda_template: 'LambdaTemplate'Functions¶
build_lambda_sparse¶
build_lambda_sparse(theta: np.ndarray, n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> sp.csc_matrixBuild sparse block-diagonal Lambda matrix from theta.
Lambda is constructed based on the random effects structure:
Intercept: Diagonal with theta[i] repeated
Diagonal: Block-diagonal with diagonal blocks
Slope: Block-diagonal with full Cholesky blocks
Nested/Crossed: Concatenated blocks per grouping factor
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major). | required |
n_groups_list | list[int] | Number of groups per grouping factor. For simple models: [n_groups] For nested/crossed: [n_groups_factor1, n_groups_factor2, ...] | required |
re_structure | str | Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed” | required |
metadata | dict | None | Optional metadata dict with structure information. May contain ‘re_structures_list’ for mixed models. | None |
Returns:
| Type | Description |
|---|---|
csc_matrix | Sparse block-diagonal Lambda matrix, CSC format. |
Examples:
>>> # Random intercept: (1|group)
>>> theta = np.array([1.5])
>>> Lambda = build_lambda_sparse(theta, [10], "intercept")
>>> Lambda.shape
(10, 10)>>> # Correlated slopes: (1+x|group)
>>> theta = np.array([1.5, 0.3, 1.2]) # [L00, L10, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "slope")
>>> Lambda.shape
(36, 36) # 18 groups x 2 RE per group>>> # Uncorrelated slopes: (1+x||group)
>>> theta = np.array([1.5, 1.2]) # [L00, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "diagonal")
>>> Lambda.shape
(36, 36)Notes:
Returns CSC format for compatibility with CHOLMOD
Blocks are arranged per lme4’s conventions
For diagonal structures, uses blocked layout (all intercepts, then slopes)
build_lambda_template¶
build_lambda_template(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> LambdaTemplateBuild a Lambda template for efficient repeated updates.
Creates a sparse matrix template with the correct sparsity pattern, plus a mapping from theta indices to data array positions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_groups_list | list[int] | Number of groups per grouping factor. | required |
re_structure | str | Random effects structure type. | required |
metadata | dict | None | Optional metadata dict with structure information. | None |
Returns:
| Type | Description |
|---|---|
LambdaTemplate | LambdaTemplate with template matrix and theta-to-data mapping. |
Examples:
>>> # Create template for random intercept model
>>> template = build_lambda_template([18], "intercept")
>>> template.template.shape
(18, 18)>>> # Update with new theta values
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda.data[0]
1.5Notes:
Template is built once per model, reused during optimization
theta_to_data_map[i] gives indices in template.data for theta[i]
build_pattern_template¶
build_pattern_template(Z: sp.csc_matrix, lambda_template: LambdaTemplate) -> PatternTemplateBuild pattern template for cross-theta Cholesky caching.
Computes the “maximal” sparsity patterns of ZL and S22 using theta=ones. These patterns are used to ensure consistent sparsity across all theta evaluations, even when theta components are zero (boundary conditions).
This matches lme4’s approach where the sparsity pattern is fixed at model initialization and never changes during optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse CSC. | required |
lambda_template | LambdaTemplate | LambdaTemplate with pattern information. | required |
Returns:
| Type | Description |
|---|---|
PatternTemplate | PatternTemplate with ZL and S22 patterns. |
Examples:
>>> Z = sp.random(100, 50, density=0.1, format='csc')
>>> lambda_tpl = build_lambda_template([50], "intercept")
>>> pattern_tpl = build_pattern_template(Z, lambda_tpl)
>>> # Now use pattern_tpl for pattern-preserved computationscholesky_block_to_theta¶
cholesky_block_to_theta(L: np.ndarray) -> np.ndarrayConvert lower-triangular Cholesky block to theta vector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
L | ndarray | Lower triangular Cholesky block, shape (dim, dim). | required |
Returns:
| Type | Description |
|---|---|
ndarray | Theta vector (lower triangle elements, column-major). |
Examples:
>>> L = np.array([[1.5, 0.0], [0.3, 1.2]])
>>> theta = cholesky_block_to_theta(L)
>>> theta
array([1.5, 0.3, 1.2])Notes:
Extracts lower triangle column-by-column
Inverse of theta_to_cholesky_block
compute_s22_preserve_pattern¶
compute_s22_preserve_pattern(ZL: sp.csc_matrix, W: sp.csc_matrix, S22_pattern: sp.csc_matrix) -> sp.csc_matrixCompute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity.
Similar to compute_zl_preserve_pattern, this ensures S22 always has the same sparsity pattern regardless of theta values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ZL | csc_matrix | Pattern-preserved ZL matrix from compute_zl_preserve_pattern. | required |
W | csc_matrix | Diagonal weight matrix. | required |
S22_pattern | csc_matrix | The “maximal” S22 pattern from build_pattern_template. | required |
Returns:
| Type | Description |
|---|---|
csc_matrix | S22 with S22_pattern’s sparsity structure but current values. |
compute_zl_preserve_pattern¶
compute_zl_preserve_pattern(Z: sp.csc_matrix, Lambda: sp.csc_matrix, ZL_pattern: sp.csc_matrix) -> sp.csc_matrixCompute Z @ Lambda while preserving the sparsity pattern of ZL_pattern.
When theta has zeros (boundary), scipy’s Z @ Lambda prunes explicit zeros, changing the sparsity pattern. This function ensures the result always has the same pattern as ZL_pattern by adding explicit zeros where needed.
This is critical for CHOLMOD caching: the factorization reuses the symbolic analysis, which depends on having a consistent sparsity pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Z | csc_matrix | Random effects design matrix (n, q). | required |
Lambda | csc_matrix | Current Lambda matrix (may have zeros from boundary theta). | required |
ZL_pattern | csc_matrix | The “maximal” ZL pattern from build_pattern_template. | required |
Returns:
| Type | Description |
|---|---|
csc_matrix | ZL with ZL_pattern’s sparsity structure but current values. |
Notes:
Pattern positions not in Z @ Lambda are filled with explicit zeros.
This matches lme4’s updateLamtUt() which preserves pattern via manual iteration.
theta_to_cholesky_block¶
theta_to_cholesky_block(theta: np.ndarray, dim: int) -> np.ndarrayConvert theta vector to lower-triangular Cholesky block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major). For 2x2: [L00, L10, L11], length = 3 For 3x3: [L00, L10, L11, L20, L21, L22], length = 6 | required |
dim | int | Dimension of the Cholesky block (2, 3, etc.). | required |
Returns:
| Type | Description |
|---|---|
ndarray | Lower triangular Cholesky block, shape (dim, dim). |
Examples:
>>> theta = np.array([1.5, 0.3, 1.2])
>>> L = theta_to_cholesky_block(theta, 2)
>>> L
array([[1.5, 0. ],
[0.3, 1.2]])Notes:
Fills lower triangle column-by-column (Fortran order)
Diagonal elements should be positive for valid Cholesky
theta_to_variance_params¶
theta_to_variance_params(theta: np.ndarray, is_diagonal: bool = False) -> tuple[np.ndarray, np.ndarray]Convert theta (Cholesky elements) to variance parameters.
Computes standard deviations and correlations from Cholesky factor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major). For intercept: [L00], length = 1 For slopes (2D): [L00, L10, L11], length = 3 For diagonal (2D): [L00, L11], length = 2 | required |
is_diagonal | bool | If True, theta contains only diagonal elements ( |
Returns:
| Type | Description |
|---|---|
ndarray | Tuple of (standard_deviations, correlations): |
ndarray | - standard_deviations: sqrt(diag(Sigma)), shape (n,) |
tuple[ndarray, ndarray] | - correlations: off-diagonal correlations, shape (n*(n-1)/2,) |
Examples:
>>> # Correlated slopes
>>> theta = np.array([1.5, 0.3, 1.2])
>>> sds, rhos = theta_to_variance_params(theta)
>>> sds # sqrt(diag(L @ L.T))
array([1.5 , 1.23693169])
>>> rhos # correlation between intercept and slope
array([0.24253563])>>> # Uncorrelated slopes
>>> theta = np.array([1.5, 1.2])
>>> sds, rhos = theta_to_variance_params(theta, is_diagonal=True)
>>> sds
array([1.5, 1.2])
>>> rhos # empty for diagonal structure
array([])Notes:
For diagonal structures, SDs are just theta values
For full structures, computes Sigma = L @ L.T then extracts params
update_lambda_from_template¶
update_lambda_from_template(template: LambdaTemplate, theta: np.ndarray) -> sp.csc_matrixUpdate Lambda matrix from template using new theta values.
Efficiently updates only the data values without rebuilding structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template | LambdaTemplate | LambdaTemplate created by build_lambda_template. | required |
theta | ndarray | New theta values. | required |
Returns:
| Type | Description |
|---|---|
csc_matrix | Updated sparse Lambda matrix in CSC format. |
Examples:
>>> template = build_lambda_template([18], "intercept")
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda[0, 0]
1.5Notes:
Much faster than build_lambda_sparse for repeated calls
Modifies template.data in place for maximum efficiency
lambda_sparse¶
Sparse Lambda matrix construction and theta-Cholesky conversions.
Lambda is the Cholesky-like factor of the random effects covariance matrix. It is block-diagonal, with blocks corresponding to grouping factors.
Theta Parameterization (lme4-compatible):¶
Theta contains elements of the lower triangular Cholesky factor L, where the random effects covariance is Sigma = sigma^2 * Lambda * Lambda’ and Lambda is built from theta.
Theta is on the relative scale (divided by sigma), following lme4’s convention.
Examples:
Random intercept (1|group): theta = [tau/sigma], dim = 1
Correlated slopes (1+x|group): theta = [L00/sigma, L10/sigma, L11/sigma], dim = 3
Uncorrelated slopes (1+x||group): theta = [L00/sigma, L11/sigma], dim = 2
Nested (1|school/class): theta = [tau1/sigma, tau2/sigma], dim = 2
Crossed (1|subject) + (1|item): theta = [tau1/sigma, tau2/sigma], dim = 2
During optimization, theta is constrained:
Diagonal elements: theta >= 0 (ensures PSD)
Off-diagonal elements: -inf < theta < inf
Functions:
| Name | Description |
|---|---|
build_lambda_sparse | Build sparse block-diagonal Lambda matrix from theta. |
cholesky_block_to_theta | Convert lower-triangular Cholesky block to theta vector. |
theta_to_cholesky_block | Convert theta vector to lower-triangular Cholesky block. |
theta_to_variance_params | Convert theta (Cholesky elements) to variance parameters. |
Functions¶
build_lambda_sparse¶
build_lambda_sparse(theta: np.ndarray, n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> sp.csc_matrixBuild sparse block-diagonal Lambda matrix from theta.
Lambda is constructed based on the random effects structure:
Intercept: Diagonal with theta[i] repeated
Diagonal: Block-diagonal with diagonal blocks
Slope: Block-diagonal with full Cholesky blocks
Nested/Crossed: Concatenated blocks per grouping factor
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major). | required |
n_groups_list | list[int] | Number of groups per grouping factor. For simple models: [n_groups] For nested/crossed: [n_groups_factor1, n_groups_factor2, ...] | required |
re_structure | str | Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed” | required |
metadata | dict | None | Optional metadata dict with structure information. May contain ‘re_structures_list’ for mixed models. | None |
Returns:
| Type | Description |
|---|---|
csc_matrix | Sparse block-diagonal Lambda matrix, CSC format. |
Examples:
>>> # Random intercept: (1|group)
>>> theta = np.array([1.5])
>>> Lambda = build_lambda_sparse(theta, [10], "intercept")
>>> Lambda.shape
(10, 10)>>> # Correlated slopes: (1+x|group)
>>> theta = np.array([1.5, 0.3, 1.2]) # [L00, L10, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "slope")
>>> Lambda.shape
(36, 36) # 18 groups x 2 RE per group>>> # Uncorrelated slopes: (1+x||group)
>>> theta = np.array([1.5, 1.2]) # [L00, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "diagonal")
>>> Lambda.shape
(36, 36)Notes:
Returns CSC format for compatibility with CHOLMOD
Blocks are arranged per lme4’s conventions
For diagonal structures, uses blocked layout (all intercepts, then slopes)
cholesky_block_to_theta¶
cholesky_block_to_theta(L: np.ndarray) -> np.ndarrayConvert lower-triangular Cholesky block to theta vector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
L | ndarray | Lower triangular Cholesky block, shape (dim, dim). | required |
Returns:
| Type | Description |
|---|---|
ndarray | Theta vector (lower triangle elements, column-major). |
Examples:
>>> L = np.array([[1.5, 0.0], [0.3, 1.2]])
>>> theta = cholesky_block_to_theta(L)
>>> theta
array([1.5, 0.3, 1.2])Notes:
Extracts lower triangle column-by-column
Inverse of theta_to_cholesky_block
theta_to_cholesky_block¶
theta_to_cholesky_block(theta: np.ndarray, dim: int) -> np.ndarrayConvert theta vector to lower-triangular Cholesky block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major). For 2x2: [L00, L10, L11], length = 3 For 3x3: [L00, L10, L11, L20, L21, L22], length = 6 | required |
dim | int | Dimension of the Cholesky block (2, 3, etc.). | required |
Returns:
| Type | Description |
|---|---|
ndarray | Lower triangular Cholesky block, shape (dim, dim). |
Examples:
>>> theta = np.array([1.5, 0.3, 1.2])
>>> L = theta_to_cholesky_block(theta, 2)
>>> L
array([[1.5, 0. ],
[0.3, 1.2]])Notes:
Fills lower triangle column-by-column (Fortran order)
Diagonal elements should be positive for valid Cholesky
theta_to_variance_params¶
theta_to_variance_params(theta: np.ndarray, is_diagonal: bool = False) -> tuple[np.ndarray, np.ndarray]Convert theta (Cholesky elements) to variance parameters.
Computes standard deviations and correlations from Cholesky factor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major). For intercept: [L00], length = 1 For slopes (2D): [L00, L10, L11], length = 3 For diagonal (2D): [L00, L11], length = 2 | required |
is_diagonal | bool | If True, theta contains only diagonal elements ( |
Returns:
| Type | Description |
|---|---|
ndarray | Tuple of (standard_deviations, correlations): |
ndarray | - standard_deviations: sqrt(diag(Sigma)), shape (n,) |
tuple[ndarray, ndarray] | - correlations: off-diagonal correlations, shape (n*(n-1)/2,) |
Examples:
>>> # Correlated slopes
>>> theta = np.array([1.5, 0.3, 1.2])
>>> sds, rhos = theta_to_variance_params(theta)
>>> sds # sqrt(diag(L @ L.T))
array([1.5 , 1.23693169])
>>> rhos # correlation between intercept and slope
array([0.24253563])>>> # Uncorrelated slopes
>>> theta = np.array([1.5, 1.2])
>>> sds, rhos = theta_to_variance_params(theta, is_diagonal=True)
>>> sds
array([1.5, 1.2])
>>> rhos # empty for diagonal structure
array([])Notes:
For diagonal structures, SDs are just theta values
For full structures, computes Sigma = L @ L.T then extracts params
lambda_template¶
Lambda matrix templates and sparsity pattern preservation.
Provides LambdaTemplate and PatternTemplate for efficient repeated Lambda construction during optimization. Templates cache sparsity patterns so that only data values need updating when theta changes.
Pattern preservation functions ensure consistent sparsity across all theta evaluations, even at boundary conditions (theta=0), matching lme4’s behavior.
Classes:
| Name | Description |
|---|---|
LambdaTemplate | Template for efficient Lambda matrix updates during optimization. |
PatternTemplate | Template for preserving sparsity patterns across theta evaluations. |
Functions:
| Name | Description |
|---|---|
build_lambda_template | Build a Lambda template for efficient repeated updates. |
build_pattern_template | Build pattern template for cross-theta Cholesky caching. |
compute_s22_preserve_pattern | Compute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity. |
compute_zl_preserve_pattern | Compute Z @ Lambda while preserving the sparsity pattern of ZL_pattern. |
infer_diagonal_dim | Infer dimension for diagonal structure from metadata. |
infer_slope_dim | Infer dimension for slope structure from metadata. |
update_lambda_from_template | Update Lambda matrix from template using new theta values. |
Classes¶
LambdaTemplate¶
LambdaTemplate(template: sp.csc_matrix, theta_to_data_map: list[np.ndarray], re_structure: str, n_groups_list: list[int]) -> NoneTemplate for efficient Lambda matrix updates during optimization.
Caches the sparsity pattern of Lambda so that repeated calls only update the data values, avoiding reconstruction of indices.
Attributes:
| Name | Type | Description |
|---|---|---|
template | csc_matrix | Sparse CSC matrix with correct structure but placeholder values. |
theta_to_data_map | list[ndarray] | List mapping theta indices to positions in template.data. Each element is an int64 array of indices into template.data. |
re_structure | str | The random effects structure type. |
n_groups_list | list[int] | Number of groups per factor. |
Attributes¶
n_groups_list¶
n_groups_list: list[int]re_structure¶
re_structure: strtemplate¶
template: sp.csc_matrixtheta_to_data_map¶
theta_to_data_map: list[np.ndarray]PatternTemplate¶
PatternTemplate(ZL_pattern: sp.csc_matrix, S22_pattern: sp.csc_matrix, Z: sp.csc_matrix, lambda_template: 'LambdaTemplate') -> NoneTemplate for preserving sparsity patterns across theta evaluations.
This is critical for cross-theta Cholesky caching. When theta has boundary values (theta=0), the resulting matrices would normally have fewer non-zeros. This template ensures the sparsity pattern remains constant by storing explicit zeros where needed.
Used to match lme4’s behavior where the sparsity pattern is fixed at initialization and never changes during optimization.
Attributes:
| Name | Type | Description |
|---|---|---|
ZL_pattern | csc_matrix | Sparsity pattern of Z @ Lambda (computed with theta=ones). |
S22_pattern | csc_matrix | Sparsity pattern of ZL’ @ ZL + I (computed with theta=ones). |
Z | csc_matrix | Reference to the original Z matrix for value computation. |
lambda_template | ‘LambdaTemplate’ | Reference to the LambdaTemplate. |
Attributes¶
S22_pattern¶
S22_pattern: sp.csc_matrixZ¶
Z: sp.csc_matrixZL_pattern¶
ZL_pattern: sp.csc_matrixlambda_template¶
lambda_template: 'LambdaTemplate'Functions¶
build_lambda_template¶
build_lambda_template(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> LambdaTemplateBuild a Lambda template for efficient repeated updates.
Creates a sparse matrix template with the correct sparsity pattern, plus a mapping from theta indices to data array positions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_groups_list | list[int] | Number of groups per grouping factor. | required |
re_structure | str | Random effects structure type. | required |
metadata | dict | None | Optional metadata dict with structure information. | None |
Returns:
| Type | Description |
|---|---|
LambdaTemplate | LambdaTemplate with template matrix and theta-to-data mapping. |
Examples:
>>> # Create template for random intercept model
>>> template = build_lambda_template([18], "intercept")
>>> template.template.shape
(18, 18)>>> # Update with new theta values
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda.data[0]
1.5Notes:
Template is built once per model, reused during optimization
theta_to_data_map[i] gives indices in template.data for theta[i]
build_pattern_template¶
build_pattern_template(Z: sp.csc_matrix, lambda_template: LambdaTemplate) -> PatternTemplateBuild pattern template for cross-theta Cholesky caching.
Computes the “maximal” sparsity patterns of ZL and S22 using theta=ones. These patterns are used to ensure consistent sparsity across all theta evaluations, even when theta components are zero (boundary conditions).
This matches lme4’s approach where the sparsity pattern is fixed at model initialization and never changes during optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse CSC. | required |
lambda_template | LambdaTemplate | LambdaTemplate with pattern information. | required |
Returns:
| Type | Description |
|---|---|
PatternTemplate | PatternTemplate with ZL and S22 patterns. |
Examples:
>>> Z = sp.random(100, 50, density=0.1, format='csc')
>>> lambda_tpl = build_lambda_template([50], "intercept")
>>> pattern_tpl = build_pattern_template(Z, lambda_tpl)
>>> # Now use pattern_tpl for pattern-preserved computationscompute_s22_preserve_pattern¶
compute_s22_preserve_pattern(ZL: sp.csc_matrix, W: sp.csc_matrix, S22_pattern: sp.csc_matrix) -> sp.csc_matrixCompute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity.
Similar to compute_zl_preserve_pattern, this ensures S22 always has the same sparsity pattern regardless of theta values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ZL | csc_matrix | Pattern-preserved ZL matrix from compute_zl_preserve_pattern. | required |
W | csc_matrix | Diagonal weight matrix. | required |
S22_pattern | csc_matrix | The “maximal” S22 pattern from build_pattern_template. | required |
Returns:
| Type | Description |
|---|---|
csc_matrix | S22 with S22_pattern’s sparsity structure but current values. |
compute_zl_preserve_pattern¶
compute_zl_preserve_pattern(Z: sp.csc_matrix, Lambda: sp.csc_matrix, ZL_pattern: sp.csc_matrix) -> sp.csc_matrixCompute Z @ Lambda while preserving the sparsity pattern of ZL_pattern.
When theta has zeros (boundary), scipy’s Z @ Lambda prunes explicit zeros, changing the sparsity pattern. This function ensures the result always has the same pattern as ZL_pattern by adding explicit zeros where needed.
This is critical for CHOLMOD caching: the factorization reuses the symbolic analysis, which depends on having a consistent sparsity pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Z | csc_matrix | Random effects design matrix (n, q). | required |
Lambda | csc_matrix | Current Lambda matrix (may have zeros from boundary theta). | required |
ZL_pattern | csc_matrix | The “maximal” ZL pattern from build_pattern_template. | required |
Returns:
| Type | Description |
|---|---|
csc_matrix | ZL with ZL_pattern’s sparsity structure but current values. |
Notes:
Pattern positions not in Z @ Lambda are filled with explicit zeros.
This matches lme4’s updateLamtUt() which preserves pattern via manual iteration.
infer_diagonal_dim¶
infer_diagonal_dim(metadata: dict) -> intInfer dimension for diagonal structure from metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata | dict | Metadata dict possibly containing ‘re_terms’ or ‘random_names’. | required |
Returns:
| Type | Description |
|---|---|
int | Dimension of the diagonal structure. |
infer_slope_dim¶
infer_slope_dim(metadata: dict) -> intInfer dimension for slope structure from metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata | dict | Metadata dict possibly containing ‘re_terms’ or ‘random_names’. | required |
Returns:
| Type | Description |
|---|---|
int | Dimension of the slope structure. |
update_lambda_from_template¶
update_lambda_from_template(template: LambdaTemplate, theta: np.ndarray) -> sp.csc_matrixUpdate Lambda matrix from template using new theta values.
Efficiently updates only the data values without rebuilding structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template | LambdaTemplate | LambdaTemplate created by build_lambda_template. | required |
theta | ndarray | New theta values. | required |
Returns:
| Type | Description |
|---|---|
csc_matrix | Updated sparse Lambda matrix in CSC format. |
Examples:
>>> template = build_lambda_template([18], "intercept")
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda[0, 0]
1.5Notes:
Much faster than build_lambda_sparse for repeated calls
Modifies template.data in place for maximum efficiency
lmer¶
Core LMM operations: PLS solving and deviance computation.
This module implements the Penalized Least Squares (PLS) formulation for linear mixed models, matching lme4’s canonical approach.
minimize: ||y - Xβ - ZΛu||² + ||u||² minimize: ||y - Xβ - ZΛu||² + ||u||²
where:
X is the fixed effects design matrix
Z is the random effects design matrix
Λ is the Cholesky-like factor from theta parameters
u are spherical random effects (u ~ N(0, I))
Architecture:
Uses scipy.sparse for sparse matrices and scikit-sparse for CHOLMOD
Dense linear algebra via backend (JAX JIT or NumPy), dispatched through _jit_cache
Returns NumPy arrays for downstream processing
lme4 factors only S22 = Λ’Z’ZΛ + I (not the full augmented system) lme4 factors only S22 = Λ’Z’ZΛ + I (not the full augmented system) and uses the Schur complement to solve for beta. See:
lme4/src/predModule.cpp lines 263-301 (updateDecomp)
lme4/src/predModule.cpp lines 189-203 (solve)
Classes:
| Name | Description |
|---|---|
PLSInvariants | Pre-computed quantities that don’t change during theta optimization. |
Functions:
| Name | Description |
|---|---|
apply_sqrt_weights | Apply sqrt(weights) transformation to design matrices and response. |
compute_pls_invariants | Pre-compute quantities that are constant during optimization. |
lmm_deviance_sparse | Compute LMM deviance for optimization. |
solve_pls_sparse | Solve Penalized Least Squares system using Schur complement. |
Classes¶
PLSInvariants¶
PLSInvariants(XtX: np.ndarray, Xty: np.ndarray, X_backend: np.ndarray, y_backend: np.ndarray) -> NonePre-computed quantities that don’t change during theta optimization.
These are computed once when the model is set up and reused for every deviance evaluation during BOBYQA optimization.
Attributes:
| Name | Type | Description |
|---|---|---|
XtX | ndarray | Gram matrix X’X, shape (p, p). Backend array (JAX or NumPy). |
Xty | ndarray | Cross-product X’y, shape (p,). Backend array. |
X_backend | ndarray | Fixed effects design matrix as backend array, shape (n, p). |
y_backend | ndarray | Response vector as backend array, shape (n,). |
Attributes¶
X_backend¶
X_backend: np.ndarrayXtX¶
XtX: np.ndarrayXty¶
Xty: np.ndarrayy_backend¶
y_backend: np.ndarrayFunctions¶
apply_sqrt_weights¶
apply_sqrt_weights(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, weights: np.ndarray | None) -> tuple[np.ndarray, sp.csc_matrix, np.ndarray, np.ndarray | None]Apply sqrt(weights) transformation to design matrices and response.
Transforms the weighted LMM problem into an unweighted problem on transformed data. This is the standard approach from lme4 and MixedModels.jl.
minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||² minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||²
After transformation (X_w, Z_w, y_w = W^{1/2}X, W^{1/2}Z, W^{1/2}y): minimize ||y_w - X_w·β - Z_w·Λu||² + ||u||²
This is the standard unweighted PLS, so the same solver applies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix, shape (n, p). | required |
Z | csc_matrix | Random effects design matrix, sparse csc, shape (n, q). | required |
y | ndarray | Response vector, shape (n,). | required |
weights | ndarray | None | Observation weights, shape (n,), or None for unweighted. | required |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, csc_matrix, ndarray, ndarray | None] | Tuple of (X_w, Z_w, y_w, sqrtwts) where: X_w: Transformed X, shape (n, p) Z_w: Transformed Z, sparse csc, shape (n, q) y_w: Transformed y, shape (n,) sqrtwts: sqrt(weights), shape (n,), or None if weights is None |
compute_pls_invariants¶
compute_pls_invariants(X: np.ndarray, y: np.ndarray) -> PLSInvariantsPre-compute quantities that are constant during optimization.
Call this once before the optimization loop to avoid redundant computation. X’X and X’y don’t depend on theta, so computing them once saves work.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix, shape (n, p). | required |
y | ndarray | Response vector, shape (n,). | required |
Returns:
| Type | Description |
|---|---|
PLSInvariants | PLSInvariants with pre-computed X’X, X’y, and backend arrays. |
Examples:
>>> X = np.random.randn(100, 3)
>>> y = np.random.randn(100)
>>> inv = compute_pls_invariants(X, y)
>>> inv.XtX.shape
(3, 3)lmm_deviance_sparse¶
lmm_deviance_sparse(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, n_groups_list: list[int], re_structure: str, method: str = 'REML', lambda_template: 'LambdaTemplate | None' = None, pls_invariants: PLSInvariants | None = None, metadata: dict | None = None, sqrtwts: np.ndarray | None = None) -> floatCompute LMM deviance for optimization.
The deviance is -2 * log-likelihood, profiled over β and σ².
-2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n -2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n
-2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p) -2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p)
For weighted models, the deviance includes an additional adjustment: -log|W| = -2*sum(log(sqrtwts))
This follows lme4 and MixedModels.jl: X, Z, y should already be pre-transformed by sqrt(weights) using apply_sqrt_weights().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (lower triangle, column-major). | required |
X | ndarray | Fixed effects design matrix, shape (n, p). Should be pre-transformed by sqrt(weights) if weighted. | required |
Z | csc_matrix | Random effects design matrix, shape (n, q), scipy sparse. Should be pre-transformed by sqrt(weights) if weighted. | required |
y | ndarray | Response vector, shape (n,). Should be pre-transformed by sqrt(weights) if weighted. | required |
n_groups_list | list[int] | Number of groups per grouping factor. | required |
re_structure | str | Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed” | required |
method | str | Estimation method, either “ML” or “REML”. | ‘REML’ |
lambda_template | ‘LambdaTemplate | None’ | Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call. | None |
pls_invariants | PLSInvariants | None | Optional pre-computed invariants (X’X, X’y, JAX arrays). If provided, avoids redundant matrix computation in optimization loop. Use compute_pls_invariants() to create this before the loop. | None |
metadata | dict | None | Optional metadata dict with structure information. Required for crossed/nested/mixed structures with non-intercept factors. Should contain ‘re_structures_list’ specifying each factor’s structure. | None |
sqrtwts | ndarray | None | Optional sqrt(weights), shape (n,). If provided, adds the log-determinant adjustment -2*sum(log(sqrtwts)) to the deviance. This compensates for the Jacobian from the weight transformation. | None |
Returns:
| Type | Description |
|---|---|
float | Deviance value (scalar). |
Examples:
>>> import numpy as np
>>> import scipy.sparse as sp
>>> # Simple random intercept model
>>> theta = np.array([1.5])
>>> X = np.random.randn(100, 2)
>>> Z = sp.random(100, 10, density=0.1, format='csc')
>>> y = np.random.randn(100)
>>> dev = lmm_deviance_sparse(theta, X, Z, y, [10], "intercept", "REML")
>>> dev > 0
TrueNotes:
Requires lambda_builder module for building Λ from theta
Returns ML deviance if method=“ML”, REML criterion if method=“REML”
Used as objective function in BOBYQA optimization
Pass lambda_template for ~10-20% speedup in optimization loop
Pass pls_invariants for additional speedup (avoids X’X, X’y computation)
For weighted models, pass sqrtwts and pre-transformed X, Z, y
solve_pls_sparse¶
solve_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, pls_invariants: PLSInvariants | None = None) -> dictSolve Penalized Least Squares system using Schur complement.
Uses lme4’s algorithm: factor only S22 = Λ’Z’ZΛ + I, then use Schur complement to solve for beta. This avoids factoring the full augmented system.
Algorithm (matching lme4/src/predModule.cpp): 1. Factor S22 = Λ’Z’ZΛ + I via CHOLMOD 2. Compute RZX = L^{-1} * Λ’Z’X (forward solve) 3. Compute Schur = X’X - RZX’ * RZX (rank update) 4. Solve for beta via Schur complement 5. Back-solve for u
ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k groups), this uses ~50MB instead of ~2.4GB per iteration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix, shape (n, p). | required |
Z | csc_matrix | Random effects design matrix, shape (n, q), scipy sparse CSC. | required |
Lambda | csc_matrix | Cholesky-like factor, shape (q, q), scipy sparse CSC. | required |
y | ndarray | Response vector, shape (n,). | required |
pls_invariants | PLSInvariants | None | Optional pre-computed invariants (X’X, X’y, backend arrays). If provided, avoids redundant computation in optimization loop. Use compute_pls_invariants() to create this before the loop. | None |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with keys: |
dict | - beta: Fixed effects coefficients, shape (p,). |
dict | - u: Spherical random effects, shape (q,). |
dict | - logdet_L: Log-determinant of (Λ’Z’ZΛ + I). |
dict | - rss: Residual sum of squares |
dict | - logdet_RX: Log-determinant of Schur complement (for REML). |
Examples:
>>> import numpy as np
>>> import scipy.sparse as sp
>>> n, p, q = 100, 3, 10
>>> X = np.random.randn(n, p)
>>> Z = sp.random(n, q, density=0.1, format='csc')
>>> Lambda = sp.eye(q, format='csc')
>>> y = np.random.randn(n)
>>> result = solve_pls_sparse(X, Z, Lambda, y)
>>> result['beta'].shape
(3,)
>>> result['u'].shape
(10,)Notes:
Converts Z and Lambda to CSC format if needed (CHOLMOD requirement)
Uses CHOLMOD for numerically stable sparse Cholesky of S22
Dense linear algebra uses JIT-compiled functions (via _get_dense_ops)
ZΛ stays sparse; cross-products use efficient sparse-dense ops
Returns RSS without penalty term (PWRSS = RSS + ||u||²)
optimize¶
BOBYQA optimizer via NLOPT for variance component optimization.
Functions:
| Name | Description |
|---|---|
optimize_theta | Optimize theta using BOBYQA via NLOPT. |
Functions¶
optimize_theta¶
optimize_theta(objective: Callable, theta0: np.ndarray, lower: np.ndarray, upper: np.ndarray, rhobeg: float = 0.002, rhoend: float = 2e-07, maxfun: int = 10000, verbose: bool = False) -> dictOptimize theta using BOBYQA via NLOPT.
BOBYQA is Powell’s derivative-free trust-region method for bound-constrained optimization. This implementation uses NLOPT’s LN_BOBYQA algorithm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
objective | Callable | Function f(theta) -> scalar deviance to minimize. | required |
theta0 | ndarray | Initial parameter values (array-like). | required |
lower | ndarray | Lower bounds for parameters. | required |
upper | ndarray | Upper bounds for parameters. | required |
rhobeg | float | Initial trust region radius (sets initial step size). Default 2e-3 (lme4’s default of 0.002). | 0.002 |
rhoend | float | Final trust region radius (used to set tolerances). Default 2e-7 (lme4 default). | 2e-07 |
maxfun | int | Maximum function evaluations. Default 10000. | 10000 |
verbose | bool | Print optimization progress. | False |
Returns:
| Type | Description |
|---|---|
dict | dict with keys: - theta: Optimized parameters (np.ndarray) - fun: Final objective value (float) - converged: Whether optimization succeeded (bool) - n_evals: Number of function evaluations (int) - message: Status message (str) |
Examples:
>>> def rosenbrock(theta):
... x, y = theta
... return (1 - x)**2 + 100 * (y - x**2)**2
>>> result = optimize_theta(
... rosenbrock,
... theta0=[0.0, 0.0],
... lower=[-2.0, -2.0],
... upper=[2.0, 2.0]
... )
>>> result['converged']
True
>>> np.allclose(result['theta'], [1.0, 1.0], atol=1e-3)
Truepirls_sparse¶
Sparse PIRLS (Penalized Iteratively Reweighted Least Squares) core routines.
This module contains the inner-loop routines for fitting generalized linear mixed
models via the Laplace approximation. These are called by the orchestrator in
glmer.py and are not intended for direct use.
lme4 external.cpp (pwrssUpdate), MixedModels.jl pirls! lme4 external.cpp (pwrssUpdate), MixedModels.jl pirls!
Classes¶
Functions¶
compute_glmm_loglik¶
compute_glmm_loglik(y: np.ndarray, mu: np.ndarray, family: Family, sqrL: float, logdet: float, prior_weights: np.ndarray | None = None) -> floatCompute GLMM log-likelihood via Laplace approximation.
Matches lme4’s logLik() for GLMMs: loglik = cond_loglik - 0.5 * (||u||^2 + logdet)
where cond_loglik is the full conditional log-likelihood including normalizing constants (log(y!), log(choose(m,y)), etc.).
For weighted binomial (grouped), uses the full binomial log-likelihood with log(choose(m, y_counts)) to match lme4’s aic() function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values (n,). For weighted binomial, these are proportions. | required |
mu | ndarray | Fitted values (n,). | required |
family | Family | GLM family with loglik method. | required |
sqrL | float | Sum of squared spherical random effects | |
logdet | float | Log-determinant log | L’Z’WZL + I |
prior_weights | ndarray | None | Prior weights (n,). For binomial proportions, the number of trials per observation. | None |
Returns:
| Type | Description |
|---|---|
float | Log-likelihood (scalar). |
compute_irls_quantities¶
compute_irls_quantities(y: np.ndarray, eta: np.ndarray, family: Family) -> tuple[np.ndarray, np.ndarray]Compute IRLS working weights and working response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values (n,) | required |
eta | ndarray | Linear predictor (n,) | required |
family | Family | GLM family object | required |
Returns:
| Name | Type | Description |
|---|---|---|
working_weights | ndarray | IRLS weights w = 1 / (V(mu) * (deta/dmu)^2) |
working_response | ndarray | Pseudo-response z = eta + (y - mu) * deta/dmu |
glmm_deviance¶
glmm_deviance(y: np.ndarray, mu: np.ndarray, family: Family, logdet: float, sqrL: float, prior_weights: np.ndarray | None = None) -> floatCompute GLMM deviance via Laplace approximation.
deviance_GLMM = sum(dev_resid) + logdet + ||u||^2 deviance_GLMM = sum(dev_resid) + logdet + ||u||^2
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y | ndarray | Response values (n,) | required |
mu | ndarray | Fitted values mu = g^{-1}(eta) (n,) | required |
family | Family | GLM family | required |
logdet | float | Log-determinant log | L’Z’WZL + I |
sqrL | float | Sum of squared spherical random effects | |
prior_weights | ndarray | None | Prior weights for observations (n,). For binomial with proportions, this is the number of trials. | None |
Returns:
| Type | Description |
|---|---|
float | GLMM deviance (scalar) |
pirls_sparse¶
pirls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, family: Family, eta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, beta_fixed: np.ndarray | None = None, max_iter: int = 30, tol: float = 1e-07, verbose: bool = False, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dictPenalized Iteratively Reweighted Least Squares using sparse operations.
For fixed variance parameters encoded in Lambda, iteratively solves for beta and u using weighted penalized least squares. This is the inner loop of GLMM optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse | required |
Lambda | csc_matrix | Cholesky-like factor from theta (q, q), scipy sparse | required |
y | ndarray | Response vector (n,) | required |
family | Family | GLM family (binomial, poisson) | required |
eta_init | ndarray | None | Initial linear predictor (optional) | None |
prior_weights | ndarray | None | Prior observation weights (n,). For binomial with proportions, this should be the number of trials. | None |
beta_fixed | ndarray | None | If provided, solve only for u with beta held fixed. This is used in Stage 2 optimization to evaluate deviance at arbitrary (theta, beta) points without re-optimizing beta. | None |
max_iter | int | Maximum PIRLS iterations | 30 |
tol | float | Convergence tolerance on relative deviance change (1e-7 matches lme4) | 1e-07 |
verbose | bool | Print iteration info | False |
factor_S22 | ‘SparseFactorization | None’ | Cached Cholesky factor for S22 block (L’Z’WZL + I). If provided, reuses symbolic analysis across calls. Useful for caching across theta evaluations in the optimizer. | None |
factor_S | ‘SparseFactorization | None’ | Cached Cholesky factor for full augmented system. Only used when beta_fixed is None (full solve mode). | None |
pattern_template | ‘PatternTemplate | None’ | PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. This is required for cross-theta factor caching to work correctly when theta has boundary values (theta=0). | None |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with keys: |
dict | - beta: Fixed effects coefficients (p,) |
dict | - u: Spherical random effects (q,) |
dict | - eta: Final linear predictor (n,) |
dict | - mu: Final fitted values (n,) |
dict | - deviance: GLMM deviance at convergence |
dict | - converged: Whether PIRLS converged |
dict | - n_iter: Number of iterations |
dict | - logdet_L: Log-determinant |
dict | - factor_S22: Cholesky factor for S22 (for caching) |
dict | - factor_S: Cholesky factor for full system (for caching, full mode only) |
solve_weighted_pls_sparse¶
solve_weighted_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, z: np.ndarray, weights: np.ndarray, beta_fixed: np.ndarray | None = None, ZL: sp.csc_matrix | None = None, ZL_dense: np.ndarray | None = None, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dictSolve weighted Penalized Least Squares for GLMM.
[X’WX X’WZL ] [beta] [X’Wz ] [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I ] [u] = [L’Z’Wz]
using sparse Cholesky factorization.
If beta_fixed is provided, solves only for u with beta held constant: [L’Z’WZL + I] [u] = [L’Z’W(z - X*beta)]
This is critical for Stage 2 optimization where we need to evaluate deviance at arbitrary (theta, beta) points without PIRLS re-optimizing beta.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X | ndarray | Fixed effects design matrix (n, p) | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse CSC | required |
Lambda | csc_matrix | Cholesky-like factor (q, q), scipy sparse CSC | required |
z | ndarray | Working response (n,) | required |
weights | ndarray | IRLS weights (n,) | required |
beta_fixed | ndarray | None | If provided, solve only for u with beta fixed to this value. | None |
ZL | csc_matrix | None | Pre-computed Z @ Lambda (sparse). If provided, avoids redundant computation when called repeatedly with same Z and Lambda. | None |
ZL_dense | ndarray | None | Pre-computed ZL.toarray() (dense). If provided, avoids repeated sparse-to-dense materialization on each call. | None |
factor_S22 | ‘SparseFactorization | None’ | Cached Cholesky factor for S22 block. If provided, uses cholesky_inplace() to update with new values (same sparsity pattern). This avoids expensive symbolic analysis on repeated calls. | None |
factor_S | ‘SparseFactorization | None’ | Cached Cholesky factor for full augmented system S. Only used in full solve mode (when beta_fixed is None). | None |
pattern_template | ‘PatternTemplate | None’ | PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. Required when using factor_S22 with cross-theta caching. | None |
Returns:
| Type | Description |
|---|---|
dict | Dictionary with keys: |
dict | - beta: Fixed effects coefficients (p,) |
dict | - u: Spherical random effects (q,) |
dict | - logdet_L: Log-determinant of (L’Z’WZL + I) |
dict | - fitted_z: Fitted working response Xbeta + ZL*u |
dict | - factor_S22: Cholesky factor for S22 (for caching) |
dict | - factor_S: Cholesky factor for full system S (for caching, full mode only) |
quadrature¶
Adaptive Gauss-Hermite Quadrature (AGQ) for GLMM marginal likelihood.
This module implements nAGQ > 1 integration for scalar random intercept models. The AGQ deviance replaces the Laplace deviance in Stage 2 of GLMM fitting when higher accuracy is needed (few observations per group, extreme probabilities).
Restriction: nAGQ > 1 only works for models with a single scalar random
intercept (e.g., (1|group)). This matches lme4’s restriction.
Math¶
For group i with conditional mode u_hat_i and conditional SD
sigma_hat_i:
.. math::
u_{ij} = \hat{u}_i + \sqrt{2} \, \sigma_i \, z_j
\quad (\text{adaptive nodes})
h(u_{ij}) = \sum_{k \in i} \log f(y_k | \mu(\eta_{ij,k}))
- \tfrac{1}{2} u_{ij}^2
\quad (\text{log integrand})
\log I_i = \log(\sigma_i)
+ \text{logsumexp}_j\!\bigl(
\log(w_j / \sqrt{\pi}) + h(u_{ij}) + z_j^2
\bigr)
\text{deviance} = -2 \sum_i \log I_iReference¶
Pinheiro & Bates (1995), Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model, JCGS 4(1):12-35.
lme4 source: glmFamily.cpp aic() and laplace().
Functions:
| Name | Description |
|---|---|
compute_agq_deviance | Compute AGQ deviance for Stage 2 optimization. |
Classes¶
Functions¶
compute_agq_deviance¶
compute_agq_deviance(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, group_ids: np.ndarray, nAGQ: int, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, eta_init: np.ndarray | None = None, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None) -> floatCompute AGQ deviance for Stage 2 optimization.
This is the objective function for nAGQ > 1. It:
Runs PIRLS with
beta_fixedto get conditional modesu_hatComputes per-group conditional SDs from the S22 diagonal
Evaluates the AGQ integral using Gauss-Hermite quadrature
Returns
-2 * sum_i(log_integral_i)
Only valid for scalar random intercept models (single grouping factor, intercept only).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta | ndarray | Cholesky factor elements (scalar for random intercept). | required |
beta | ndarray | Fixed effects coefficients (p,). | required |
X | ndarray | Fixed effects design matrix (n, p). | required |
Z | csc_matrix | Random effects design matrix (n, q), scipy sparse. | required |
y | ndarray | Response vector (n,). | required |
family | Family | GLM family (binomial, poisson). | required |
n_groups_list | list[int] | Number of groups per grouping factor. | required |
re_structure | str | Random effects structure type. | required |
group_ids | ndarray | Group membership array (n,), integer-coded. | required |
nAGQ | int | Number of quadrature points (must be > 1). | required |
metadata | dict | None | Additional RE metadata. | None |
prior_weights | ndarray | None | Prior observation weights. | None |
pirls_max_iter | int | Maximum PIRLS iterations per evaluation. | 30 |
pirls_tol | float | PIRLS convergence tolerance. | 1e-07 |
eta_init | ndarray | None | Initial linear predictor for PIRLS. | None |
lambda_template | ‘LambdaTemplate | None’ | Pre-built template for efficient Lambda updates. | None |
factor_cache | dict | None | Mutable dict for caching Cholesky factors. | None |
Returns:
| Type | Description |
|---|---|
float | AGQ deviance (scalar) = -2 * sum(log marginal likelihood). |