solvers - bossanova

Model-specific fitting algorithms (IRLS, PLS, PIRLS).

Classes:

Name	Description
`LambdaTemplate`	Template for efficient Lambda matrix updates during optimization.
`PLSInvariants`	Pre-computed quantities that don’t change during theta optimization.
`PatternTemplate`	Template for preserving sparsity patterns across theta evaluations.

Functions:

Name	Description
`apply_sqrt_weights`	Apply sqrt(weights) transformation to design matrices and response.
`build_lambda_sparse`	Build sparse block-diagonal Lambda matrix from theta.
`build_lambda_template`	Build a Lambda template for efficient repeated updates.
`compute_agq_deviance`	Compute AGQ deviance for Stage 2 optimization.
`compute_irls_quantities`	Compute IRLS working weights and working response.
`compute_pls_invariants`	Pre-compute quantities that are constant during optimization.
`fit_glm_irls`	Fit GLM using IRLS algorithm.
`fit_glmm_pirls`	Fit GLMM using PIRLS with outer optimization over theta.
`glmm_deviance`	Compute GLMM deviance via Laplace approximation.
`glmm_deviance_objective`	Compute GLMM deviance for outer optimization.
`lmm_deviance_sparse`	Compute LMM deviance for optimization.
`optimize_theta`	Optimize theta using BOBYQA via NLOPT.
`solve_pls_sparse`	Solve Penalized Least Squares system using Schur complement.
`solve_weighted_pls_sparse`	Solve weighted Penalized Least Squares for GLMM.
`theta_to_cholesky_block`	Convert theta vector to lower-triangular Cholesky block.
`update_lambda_from_template`	Update Lambda matrix from template using new theta values.

Modules:

Name	Description
`glm`	GLM IRLS fitting algorithm.
`glmer`	PIRLS (Penalized Iteratively Reweighted Least Squares) algorithm for GLMMs.
`heuristics`	Heuristics and numerical utilities for GLMM fitting.
`initialization`	Moment-based initialization for LMM variance components.
`lambda_builder`	Lambda matrix construction from theta parameters.
`lambda_sparse`	Sparse Lambda matrix construction and theta-Cholesky conversions.
`lambda_template`	Lambda matrix templates and sparsity pattern preservation.
`lmer`	Core LMM operations: PLS solving and deviance computation.
`optimize`	BOBYQA optimizer via NLOPT for variance component optimization.
`pirls_sparse`	Sparse PIRLS (Penalized Iteratively Reweighted Least Squares) core routines.
`quadrature`	Adaptive Gauss-Hermite Quadrature (AGQ) for GLMM marginal likelihood.

Classes¶

LambdaTemplate¶

LambdaTemplate(template: sp.csc_matrix, theta_to_data_map: list[np.ndarray], re_structure: str, n_groups_list: list[int]) -> None

Template for efficient Lambda matrix updates during optimization.

Caches the sparsity pattern of Lambda so that repeated calls only update the data values, avoiding reconstruction of indices.

Attributes:

Name	Type	Description
`template`	`csc_matrix`	Sparse CSC matrix with correct structure but placeholder values.
`theta_to_data_map`	`list[ndarray]`	List mapping theta indices to positions in template.data. Each element is an int64 array of indices into template.data.
`re_structure`	`str`	The random effects structure type.
`n_groups_list`	`list[int]`	Number of groups per factor.

Attributes¶

n_groups_list¶

n_groups_list: list[int]

re_structure¶

re_structure: str

template¶

template: sp.csc_matrix

theta_to_data_map¶

theta_to_data_map: list[np.ndarray]

PLSInvariants¶

PLSInvariants(XtX: np.ndarray, Xty: np.ndarray, X_backend: np.ndarray, y_backend: np.ndarray) -> None

Pre-computed quantities that don’t change during theta optimization.

These are computed once when the model is set up and reused for every deviance evaluation during BOBYQA optimization.

Attributes:

Name	Type	Description
`XtX`	`ndarray`	Gram matrix X’X, shape (p, p). Backend array (JAX or NumPy).
`Xty`	`ndarray`	Cross-product X’y, shape (p,). Backend array.
`X_backend`	`ndarray`	Fixed effects design matrix as backend array, shape (n, p).
`y_backend`	`ndarray`	Response vector as backend array, shape (n,).

Attributes¶

X_backend¶

X_backend: np.ndarray

XtX¶

XtX: np.ndarray

Xty¶

Xty: np.ndarray

y_backend¶

y_backend: np.ndarray

PatternTemplate¶

PatternTemplate(ZL_pattern: sp.csc_matrix, S22_pattern: sp.csc_matrix, Z: sp.csc_matrix, lambda_template: 'LambdaTemplate') -> None

Template for preserving sparsity patterns across theta evaluations.

This is critical for cross-theta Cholesky caching. When theta has boundary values (theta=0), the resulting matrices would normally have fewer non-zeros. This template ensures the sparsity pattern remains constant by storing explicit zeros where needed.

Used to match lme4’s behavior where the sparsity pattern is fixed at initialization and never changes during optimization.

Attributes:

Name	Type	Description
`ZL_pattern`	`csc_matrix`	Sparsity pattern of Z @ Lambda (computed with theta=ones).
`S22_pattern`	`csc_matrix`	Sparsity pattern of ZL’ @ ZL + I (computed with theta=ones).
`Z`	`csc_matrix`	Reference to the original Z matrix for value computation.
`lambda_template`	`‘LambdaTemplate’`	Reference to the LambdaTemplate.

Attributes¶

S22_pattern¶

S22_pattern: sp.csc_matrix

Z¶

Z: sp.csc_matrix

ZL_pattern¶

ZL_pattern: sp.csc_matrix

lambda_template¶

lambda_template: 'LambdaTemplate'

Functions¶

apply_sqrt_weights¶

apply_sqrt_weights(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, weights: np.ndarray | None) -> tuple[np.ndarray, sp.csc_matrix, np.ndarray, np.ndarray | None]

Apply sqrt(weights) transformation to design matrices and response.

Transforms the weighted LMM problem into an unweighted problem on transformed data. This is the standard approach from lme4 and MixedModels.jl.

minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||² minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||²

After transformation (X_w, Z_w, y_w = W^{1/2}X, W^{1/2}Z, W^{1/2}y): minimize ||y_w - X_w·β - Z_w·Λu||² + ||u||²

This is the standard unweighted PLS, so the same solver applies.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix, shape (n, p).	required
`Z`	`csc_matrix`	Random effects design matrix, sparse csc, shape (n, q).	required
`y`	`ndarray`	Response vector, shape (n,).	required
`weights`	`ndarray \| None`	Observation weights, shape (n,), or None for unweighted.	required

Returns:

Type	Description
`tuple[ndarray, csc_matrix, ndarray, ndarray \| None]`	Tuple of (X_w, Z_w, y_w, sqrtwts) where: X_w: Transformed X, shape (n, p) Z_w: Transformed Z, sparse csc, shape (n, q) y_w: Transformed y, shape (n,) sqrtwts: sqrt(weights), shape (n,), or None if weights is None

build_lambda_sparse¶

build_lambda_sparse(theta: np.ndarray, n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> sp.csc_matrix

Build sparse block-diagonal Lambda matrix from theta.

Lambda is constructed based on the random effects structure:

Intercept: Diagonal with theta[i] repeated
Diagonal: Block-diagonal with diagonal blocks
Slope: Block-diagonal with full Cholesky blocks
Nested/Crossed: Concatenated blocks per grouping factor

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major).	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor. For simple models: [n_groups] For nested/crossed: [n_groups_factor1, n_groups_factor2, ...]	required
`re_structure`	`str`	Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed”	required
`metadata`	`dict \| None`	Optional metadata dict with structure information. May contain ‘re_structures_list’ for mixed models.	`None`

Returns:

Type	Description
`csc_matrix`	Sparse block-diagonal Lambda matrix, CSC format.

Examples:

>>> # Random intercept: (1|group)
>>> theta = np.array([1.5])
>>> Lambda = build_lambda_sparse(theta, [10], "intercept")
>>> Lambda.shape
(10, 10)

>>> # Correlated slopes: (1+x|group)
>>> theta = np.array([1.5, 0.3, 1.2])  # [L00, L10, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "slope")
>>> Lambda.shape
(36, 36)  # 18 groups x 2 RE per group

>>> # Uncorrelated slopes: (1+x||group)
>>> theta = np.array([1.5, 1.2])  # [L00, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "diagonal")
>>> Lambda.shape
(36, 36)

Notes:

Returns CSC format for compatibility with CHOLMOD
Blocks are arranged per lme4’s conventions
For diagonal structures, uses blocked layout (all intercepts, then slopes)

build_lambda_template¶

build_lambda_template(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> LambdaTemplate

Build a Lambda template for efficient repeated updates.

Creates a sparse matrix template with the correct sparsity pattern, plus a mapping from theta indices to data array positions.

Parameters:

Name	Type	Description	Default
`n_groups_list`	`list[int]`	Number of groups per grouping factor.	required
`re_structure`	`str`	Random effects structure type.	required
`metadata`	`dict \| None`	Optional metadata dict with structure information.	`None`

Returns:

Type	Description
`LambdaTemplate`	LambdaTemplate with template matrix and theta-to-data mapping.

Examples:

>>> # Create template for random intercept model
>>> template = build_lambda_template([18], "intercept")
>>> template.template.shape
(18, 18)

>>> # Update with new theta values
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda.data[0]
1.5

Notes:

Template is built once per model, reused during optimization
theta_to_data_map[i] gives indices in template.data for theta[i]

compute_agq_deviance¶

compute_agq_deviance(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, group_ids: np.ndarray, nAGQ: int, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, eta_init: np.ndarray | None = None, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None) -> float

Compute AGQ deviance for Stage 2 optimization.

This is the objective function for nAGQ > 1. It:

Runs PIRLS with beta_fixed to get conditional modes u_hat
Computes per-group conditional SDs from the S22 diagonal
Evaluates the AGQ integral using Gauss-Hermite quadrature
Returns -2 * sum_i(log_integral_i)

Only valid for scalar random intercept models (single grouping factor, intercept only).

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (scalar for random intercept).	required
`beta`	`ndarray`	Fixed effects coefficients (p,).	required
`X`	`ndarray`	Fixed effects design matrix (n, p).	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse.	required
`y`	`ndarray`	Response vector (n,).	required
`family`	`Family`	GLM family (binomial, poisson).	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor.	required
`re_structure`	`str`	Random effects structure type.	required
`group_ids`	`ndarray`	Group membership array (n,), integer-coded.	required
`nAGQ`	`int`	Number of quadrature points (must be > 1).	required
`metadata`	`dict \| None`	Additional RE metadata.	`None`
`prior_weights`	`ndarray \| None`	Prior observation weights.	`None`
`pirls_max_iter`	`int`	Maximum PIRLS iterations per evaluation.	`30`
`pirls_tol`	`float`	PIRLS convergence tolerance.	`1e-07`
`eta_init`	`ndarray \| None`	Initial linear predictor for PIRLS.	`None`
`lambda_template`	`‘LambdaTemplate \| None’`	Pre-built template for efficient Lambda updates.	`None`
`factor_cache`	`dict \| None`	Mutable dict for caching Cholesky factors.	`None`

Returns:

Type	Description
`float`	AGQ deviance (scalar) = `-2 * sum(log marginal likelihood)`.

compute_irls_quantities¶

compute_irls_quantities(y: np.ndarray, eta: np.ndarray, family: Family) -> tuple[np.ndarray, np.ndarray]

Compute IRLS working weights and working response.

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values (n,)	required
`eta`	`ndarray`	Linear predictor (n,)	required
`family`	`Family`	GLM family object	required

Returns:

Name	Type	Description
`working_weights`	`ndarray`	IRLS weights w = 1 / (V(mu) * (deta/dmu)^2)
`working_response`	`ndarray`	Pseudo-response z = eta + (y - mu) * deta/dmu

compute_pls_invariants¶

compute_pls_invariants(X: np.ndarray, y: np.ndarray) -> PLSInvariants

Pre-compute quantities that are constant during optimization.

Call this once before the optimization loop to avoid redundant computation. X’X and X’y don’t depend on theta, so computing them once saves work.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix, shape (n, p).	required
`y`	`ndarray`	Response vector, shape (n,).	required

Returns:

Type	Description
`PLSInvariants`	PLSInvariants with pre-computed X’X, X’y, and backend arrays.

Examples:

>>> X = np.random.randn(100, 3)
>>> y = np.random.randn(100)
>>> inv = compute_pls_invariants(X, y)
>>> inv.XtX.shape
(3, 3)

fit_glm_irls¶

fit_glm_irls(y: np.ndarray, X: np.ndarray, family: Family, weights: np.ndarray | None = None, max_iter: int = 25, tol: float = 1e-08) -> dict

Fit GLM using IRLS algorithm.

Implements Iteratively Reweighted Least Squares for generalized linear models. The algorithm iterates until convergence (relative deviance change < tol) or max_iter is reached.

The core fitting loop is JIT-compiled for efficiency. All family-specific operations (link, variance, deviance) are pure JAX functions from the Family object.

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values (n,)	required
`X`	`ndarray`	Design matrix (n, p)	required
`family`	`Family`	Family object with link, variance, and deviance functions	required
`weights`	`ndarray \| None`	Prior observation weights (n,) or None for equal weights	`None`
`max_iter`	`int`	Maximum IRLS iterations (default: 25)	`25`
`tol`	`float`	Convergence tolerance for relative deviance change (default: 1e-8)	`1e-08`

Returns:

Type Description

dict Dictionary with keys: - coef: Coefficient estimates (p,) - vcov: Variance-covariance matrix (p, p) - dispersion: Dispersion parameter estimate - fitted: Fitted values μ on response scale (n,) - linear_predictor: Linear predictor η on link scale (n,) - residuals: Response residuals y - μ (n,) - deviance: Residual deviance - null_deviance: Null model deviance - df_residual: Residual degrees of freedom - loglik: Log-likelihood - converged: True if converged, False otherwise - n_iter: Number of iterations performed - has_separation: Separation detection (binomial only, None otherwise) - irls_weights: Final IRLS weights (n,)

Type	Description
`dict`	Dictionary with keys: - coef: Coefficient estimates (p,) - vcov: Variance-covariance matrix (p, p) - dispersion: Dispersion parameter estimate - fitted: Fitted values μ on response scale (n,) - linear_predictor: Linear predictor η on link scale (n,) - residuals: Response residuals y - μ (n,) - deviance: Residual deviance - null_deviance: Null model deviance - df_residual: Residual degrees of freedom - loglik: Log-likelihood - converged: True if converged, False otherwise - n_iter: Number of iterations performed - has_separation: Separation detection (binomial only, None otherwise) - irls_weights: Final IRLS weights (n,)

Notes:

Uses while_loop for early stopping on convergence
Clips IRLS weights to [1e-10, 1e10] for numerical stability
Clips μ values based on family type (binomial: [1e-10, 1-1e-10], poisson: [1e-10, inf])
Adds small regularization (1e-10) to diagonal of XtWX
Separation detection checks if fitted probabilities are extremely close to 0 or 1 (< 1e-6 or > 1-1e-6) for binomial models

fit_glmm_pirls¶

fit_glmm_pirls(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, theta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, max_outer_iter: int = 10000, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, two_stage: bool = True, lambda_template: 'LambdaTemplate | None' = None, nAGQ: int = 1, group_ids: np.ndarray | None = None) -> dict

Fit GLMM using PIRLS with outer optimization over theta.

Optimization strategy (matching lme4):

Stage 1: BOBYQA optimizes theta only, PIRLS solves for (beta, u) at each theta
Stage 2: Nelder-Mead optimizes [theta, beta] jointly, PIRLS only solves for u

The two-stage approach matches lme4 exactly:

Stage 1 (nAGQ=0): Optimize theta via BOBYQA, with PIRLS finding optimal (beta, u) for each theta.
Stage 2 (nAGQ>=1): Joint optimization of [theta, beta] via Nelder-Mead, where beta is held fixed in PIRLS (added to offset) so PIRLS only solves for u. This allows the optimizer to find the global (theta, beta) optimum.
When nAGQ > 1, Stage 2 uses adaptive Gauss-Hermite quadrature instead of the Laplace approximation for more accurate marginal likelihood.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse	required
`y`	`ndarray`	Response vector (n,)	required
`family`	`Family`	GLM family (binomial, poisson)	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor	required
`re_structure`	`str`	Random effects structure type	required
`metadata`	`dict \| None`	Additional RE metadata	`None`
`theta_init`	`ndarray \| None`	Initial theta values (optional)	`None`
`prior_weights`	`ndarray \| None`	Prior observation weights	`None`
`max_outer_iter`	`int`	Maximum BOBYQA iterations for Stage 1	`10000`
`pirls_max_iter`	`int`	Maximum PIRLS iterations per theta evaluation	`30`
`pirls_tol`	`float`	PIRLS convergence tolerance (1e-7 matches lme4)	`1e-07`
`verbose`	`bool`	Print optimization progress	`False`
`two_stage`	`bool`	Use two-stage optimization matching lme4 (default True)	`True`
`lambda_template`	`‘LambdaTemplate \| None’`	Optional pre-built template for efficient Lambda updates. If provided, uses this template instead of building a new one. This is useful for bootstrap where the same template can be reused.	`None`
`nAGQ`	`int`	Number of adaptive Gauss-Hermite quadrature points (default 1). nAGQ=1 uses the Laplace approximation. nAGQ > 1 uses AGQ for more accurate integration (scalar random intercept only).	`1`
`group_ids`	`ndarray \| None`	Group membership array (n,), integer-coded. Required when nAGQ > 1 for per-group quadrature.	`None`

Returns:

Type	Description
`dict`	Dictionary with:
`dict`	- theta: Optimized theta parameters
`dict`	- beta: Fixed effects coefficients
`dict`	- u: Spherical random effects
`dict`	- eta: Linear predictor
`dict`	- mu: Fitted values
`dict`	- deviance: Final GLMM deviance
`dict`	- loglik: Laplace log-likelihood
`dict`	- converged: Whether optimizer converged
`dict`	- n_outer_iter: Number of outer iterations
`dict`	- n_func_evals: Number of deviance evaluations
`dict`	- pirls_converged: Whether final PIRLS converged
`dict`	- pirls_n_iter: Final PIRLS iterations

glmm_deviance¶

glmm_deviance(y: np.ndarray, mu: np.ndarray, family: Family, logdet: float, sqrL: float, prior_weights: np.ndarray | None = None) -> float

Compute GLMM deviance via Laplace approximation.

deviance_GLMM = sum(dev_resid) + logdet + ||u||^2 deviance_GLMM = sum(dev_resid) + logdet + ||u||^2

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values (n,)	required
`mu`	`ndarray`	Fitted values mu = g^{-1}(eta) (n,)	required
`family`	`Family`	GLM family	required
`logdet`	`float`	Log-determinant log	L’Z’WZL + I
`sqrL`	`float`	Sum of squared spherical random effects
`prior_weights`	`ndarray \| None`	Prior weights for observations (n,). For binomial with proportions, this is the number of trials.	`None`

Returns:

Type	Description
`float`	GLMM deviance (scalar)

glmm_deviance_objective¶

glmm_deviance_objective(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None, pattern_template: 'PatternTemplate | None' = None, pirls_result_cache: dict | None = None) -> float

Compute GLMM deviance for outer optimization.

This is the objective function for BOBYQA optimization over theta.

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major)	required
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse	required
`y`	`ndarray`	Response vector (n,)	required
`family`	`Family`	GLM family (binomial, poisson)	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor	required
`re_structure`	`str`	Random effects structure type	required
`metadata`	`dict \| None`	Additional RE metadata	`None`
`prior_weights`	`ndarray \| None`	Prior observation weights	`None`
`pirls_max_iter`	`int`	Maximum PIRLS iterations per theta evaluation	`30`
`pirls_tol`	`float`	PIRLS convergence tolerance (1e-7 matches lme4)	`1e-07`
`verbose`	`bool`	Print verbose output	`False`
`lambda_template`	`‘LambdaTemplate \| None’`	Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call (~10-15% speedup).	`None`
`factor_cache`	`dict \| None`	Optional mutable dict for caching Cholesky factors across theta evaluations. Should contain keys ‘factor_S22’ and ‘factor_S’. Pass an empty dict {} to enable caching; factors will be stored and reused across calls, avoiding repeated symbolic analysis.	`None`
`pattern_template`	`‘PatternTemplate \| None’`	PatternTemplate for preserving sparsity patterns. Required for cross-theta factor caching. When provided, ensures S22 has consistent sparsity pattern even when theta hits boundaries (theta=0).	`None`
`pirls_result_cache`	`dict \| None`	Optional mutable dict for caching the last PIRLS result. When provided, stores `{"theta": theta_copy, "result": pirls_result}` on each call so the caller can reuse the result at the converged theta without a redundant PIRLS evaluation.	`None`

Returns:

Type	Description
`float`	GLMM deviance (scalar) - the Laplace approximation of -2*logLik

lmm_deviance_sparse¶

lmm_deviance_sparse(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, n_groups_list: list[int], re_structure: str, method: str = 'REML', lambda_template: 'LambdaTemplate | None' = None, pls_invariants: PLSInvariants | None = None, metadata: dict | None = None, sqrtwts: np.ndarray | None = None) -> float

Compute LMM deviance for optimization.

The deviance is -2 * log-likelihood, profiled over β and σ².

-2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n -2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n

-2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p) -2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p)

For weighted models, the deviance includes an additional adjustment: -log|W| = -2*sum(log(sqrtwts))

This follows lme4 and MixedModels.jl: X, Z, y should already be pre-transformed by sqrt(weights) using apply_sqrt_weights().

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major).	required
`X`	`ndarray`	Fixed effects design matrix, shape (n, p). Should be pre-transformed by sqrt(weights) if weighted.	required
`Z`	`csc_matrix`	Random effects design matrix, shape (n, q), scipy sparse. Should be pre-transformed by sqrt(weights) if weighted.	required
`y`	`ndarray`	Response vector, shape (n,). Should be pre-transformed by sqrt(weights) if weighted.	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor.	required
`re_structure`	`str`	Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed”	required
`method`	`str`	Estimation method, either “ML” or “REML”.	`‘REML’`
`lambda_template`	`‘LambdaTemplate \| None’`	Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call.	`None`
`pls_invariants`	`PLSInvariants \| None`	Optional pre-computed invariants (X’X, X’y, JAX arrays). If provided, avoids redundant matrix computation in optimization loop. Use compute_pls_invariants() to create this before the loop.	`None`
`metadata`	`dict \| None`	Optional metadata dict with structure information. Required for crossed/nested/mixed structures with non-intercept factors. Should contain ‘re_structures_list’ specifying each factor’s structure.	`None`
`sqrtwts`	`ndarray \| None`	Optional sqrt(weights), shape (n,). If provided, adds the log-determinant adjustment -2*sum(log(sqrtwts)) to the deviance. This compensates for the Jacobian from the weight transformation.	`None`

Returns:

Type	Description
`float`	Deviance value (scalar).

Examples:

>>> import numpy as np
>>> import scipy.sparse as sp
>>> # Simple random intercept model
>>> theta = np.array([1.5])
>>> X = np.random.randn(100, 2)
>>> Z = sp.random(100, 10, density=0.1, format='csc')
>>> y = np.random.randn(100)
>>> dev = lmm_deviance_sparse(theta, X, Z, y, [10], "intercept", "REML")
>>> dev > 0
True

Notes:

Requires lambda_builder module for building Λ from theta
Returns ML deviance if method=“ML”, REML criterion if method=“REML”
Used as objective function in BOBYQA optimization
Pass lambda_template for ~10-20% speedup in optimization loop
Pass pls_invariants for additional speedup (avoids X’X, X’y computation)
For weighted models, pass sqrtwts and pre-transformed X, Z, y

optimize_theta¶

optimize_theta(objective: Callable, theta0: np.ndarray, lower: np.ndarray, upper: np.ndarray, rhobeg: float = 0.002, rhoend: float = 2e-07, maxfun: int = 10000, verbose: bool = False) -> dict

Optimize theta using BOBYQA via NLOPT.

BOBYQA is Powell’s derivative-free trust-region method for bound-constrained optimization. This implementation uses NLOPT’s LN_BOBYQA algorithm.

Parameters:

Name	Type	Description	Default
`objective`	`Callable`	Function f(theta) -> scalar deviance to minimize.	required
`theta0`	`ndarray`	Initial parameter values (array-like).	required
`lower`	`ndarray`	Lower bounds for parameters.	required
`upper`	`ndarray`	Upper bounds for parameters.	required
`rhobeg`	`float`	Initial trust region radius (sets initial step size). Default 2e-3 (lme4’s default of 0.002).	`0.002`
`rhoend`	`float`	Final trust region radius (used to set tolerances). Default 2e-7 (lme4 default).	`2e-07`
`maxfun`	`int`	Maximum function evaluations. Default 10000.	`10000`
`verbose`	`bool`	Print optimization progress.	`False`

Returns:

Type	Description
`dict`	dict with keys: - theta: Optimized parameters (np.ndarray) - fun: Final objective value (float) - converged: Whether optimization succeeded (bool) - n_evals: Number of function evaluations (int) - message: Status message (str)

Examples:

>>> def rosenbrock(theta):
...     x, y = theta
...     return (1 - x)**2 + 100 * (y - x**2)**2
>>> result = optimize_theta(
...     rosenbrock,
...     theta0=[0.0, 0.0],
...     lower=[-2.0, -2.0],
...     upper=[2.0, 2.0]
... )
>>> result['converged']
True
>>> np.allclose(result['theta'], [1.0, 1.0], atol=1e-3)
True

solve_pls_sparse¶

solve_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, pls_invariants: PLSInvariants | None = None) -> dict

Solve Penalized Least Squares system using Schur complement.

Uses lme4’s algorithm: factor only S22 = Λ’Z’ZΛ + I, then use Schur complement to solve for beta. This avoids factoring the full augmented system.

Algorithm (matching lme4/src/predModule.cpp): 1. Factor S22 = Λ’Z’ZΛ + I via CHOLMOD 2. Compute RZX = L^{-1} * Λ’Z’X (forward solve) 3. Compute Schur = X’X - RZX’ * RZX (rank update) 4. Solve for beta via Schur complement 5. Back-solve for u

ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k groups), this uses ~50MB instead of ~2.4GB per iteration.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix, shape (n, p).	required
`Z`	`csc_matrix`	Random effects design matrix, shape (n, q), scipy sparse CSC.	required
`Lambda`	`csc_matrix`	Cholesky-like factor, shape (q, q), scipy sparse CSC.	required
`y`	`ndarray`	Response vector, shape (n,).	required
`pls_invariants`	`PLSInvariants \| None`	Optional pre-computed invariants (X’X, X’y, backend arrays). If provided, avoids redundant computation in optimization loop. Use compute_pls_invariants() to create this before the loop.	`None`

Returns:

Type	Description
`dict`	Dictionary with keys:
`dict`	- beta: Fixed effects coefficients, shape (p,).
`dict`	- u: Spherical random effects, shape (q,).
`dict`	- logdet_L: Log-determinant of (Λ’Z’ZΛ + I).
`dict`	- rss: Residual sum of squares
`dict`	- logdet_RX: Log-determinant of Schur complement (for REML).

Examples:

>>> import numpy as np
>>> import scipy.sparse as sp
>>> n, p, q = 100, 3, 10
>>> X = np.random.randn(n, p)
>>> Z = sp.random(n, q, density=0.1, format='csc')
>>> Lambda = sp.eye(q, format='csc')
>>> y = np.random.randn(n)
>>> result = solve_pls_sparse(X, Z, Lambda, y)
>>> result['beta'].shape
(3,)
>>> result['u'].shape
(10,)

Notes:

Converts Z and Lambda to CSC format if needed (CHOLMOD requirement)
Uses CHOLMOD for numerically stable sparse Cholesky of S22
Dense linear algebra uses JIT-compiled functions (via _get_dense_ops)
ZΛ stays sparse; cross-products use efficient sparse-dense ops
Returns RSS without penalty term (PWRSS = RSS + ||u||²)

solve_weighted_pls_sparse¶

solve_weighted_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, z: np.ndarray, weights: np.ndarray, beta_fixed: np.ndarray | None = None, ZL: sp.csc_matrix | None = None, ZL_dense: np.ndarray | None = None, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dict

Solve weighted Penalized Least Squares for GLMM.

[X’WX X’WZL ] [beta] [X’Wz ] [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I ] [u] = [L’Z’Wz]

using sparse Cholesky factorization.

If beta_fixed is provided, solves only for u with beta held constant: [L’Z’WZL + I] [u] = [L’Z’W(z - X*beta)]

This is critical for Stage 2 optimization where we need to evaluate deviance at arbitrary (theta, beta) points without PIRLS re-optimizing beta.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse CSC	required
`Lambda`	`csc_matrix`	Cholesky-like factor (q, q), scipy sparse CSC	required
`z`	`ndarray`	Working response (n,)	required
`weights`	`ndarray`	IRLS weights (n,)	required
`beta_fixed`	`ndarray \| None`	If provided, solve only for u with beta fixed to this value.	`None`
`ZL`	`csc_matrix \| None`	Pre-computed Z @ Lambda (sparse). If provided, avoids redundant computation when called repeatedly with same Z and Lambda.	`None`
`ZL_dense`	`ndarray \| None`	Pre-computed `ZL.toarray()` (dense). If provided, avoids repeated sparse-to-dense materialization on each call.	`None`
`factor_S22`	`‘SparseFactorization \| None’`	Cached Cholesky factor for S22 block. If provided, uses cholesky_inplace() to update with new values (same sparsity pattern). This avoids expensive symbolic analysis on repeated calls.	`None`
`factor_S`	`‘SparseFactorization \| None’`	Cached Cholesky factor for full augmented system S. Only used in full solve mode (when beta_fixed is None).	`None`
`pattern_template`	`‘PatternTemplate \| None’`	PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. Required when using factor_S22 with cross-theta caching.	`None`

Returns:

Type	Description
`dict`	Dictionary with keys:
`dict`	- beta: Fixed effects coefficients (p,)
`dict`	- u: Spherical random effects (q,)
`dict`	- logdet_L: Log-determinant of (L’Z’WZL + I)
`dict`	- fitted_z: Fitted working response Xbeta + ZL*u
`dict`	- factor_S22: Cholesky factor for S22 (for caching)
`dict`	- factor_S: Cholesky factor for full system S (for caching, full mode only)

theta_to_cholesky_block¶

theta_to_cholesky_block(theta: np.ndarray, dim: int) -> np.ndarray

Convert theta vector to lower-triangular Cholesky block.

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major). For 2x2: [L00, L10, L11], length = 3 For 3x3: [L00, L10, L11, L20, L21, L22], length = 6	required
`dim`	`int`	Dimension of the Cholesky block (2, 3, etc.).	required

Returns:

Type	Description
`ndarray`	Lower triangular Cholesky block, shape (dim, dim).

Examples:

>>> theta = np.array([1.5, 0.3, 1.2])
>>> L = theta_to_cholesky_block(theta, 2)
>>> L
array([[1.5, 0. ],
       [0.3, 1.2]])

Notes:

Fills lower triangle column-by-column (Fortran order)
Diagonal elements should be positive for valid Cholesky

update_lambda_from_template¶

update_lambda_from_template(template: LambdaTemplate, theta: np.ndarray) -> sp.csc_matrix

Update Lambda matrix from template using new theta values.

Efficiently updates only the data values without rebuilding structure.

Parameters:

Name	Type	Description	Default
`template`	`LambdaTemplate`	LambdaTemplate created by build_lambda_template.	required
`theta`	`ndarray`	New theta values.	required

Returns:

Type	Description
`csc_matrix`	Updated sparse Lambda matrix in CSC format.

Examples:

>>> template = build_lambda_template([18], "intercept")
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda[0, 0]
1.5

Notes:

Much faster than build_lambda_sparse for repeated calls
Modifies template.data in place for maximum efficiency

Modules¶

glm¶

GLM IRLS fitting algorithm.

This module implements the Iteratively Reweighted Least Squares (IRLS) algorithm for GLM fitting. The algorithm supports both JAX and NumPy backends:

JAX backend: Uses lax.while_loop for JIT-compiled, efficient iteration
NumPy backend: Uses regular Python loops for Pyodide compatibility

All functions are designed to work with Family objects from

Functions:

Name	Description
`compute_loglik`	Compute log-likelihood for a fitted GLM.
`compute_null_deviance`	Compute null deviance for the intercept-only model.
`fit_glm_irls`	Fit GLM using IRLS algorithm.

Classes¶

Functions¶

compute_loglik¶

compute_loglik(y: np.ndarray, mu: np.ndarray, deviance: float | np.ndarray, dispersion: float | np.ndarray, family: Family, n: int, xp: Any) -> np.ndarray | float

Compute log-likelihood for a fitted GLM.

Uses family-specific formulas for exact log-likelihood computation. Falls back to the deviance-based approximation for unknown families.

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values, shape (n,).	required
`mu`	`ndarray`	Fitted values on response scale, shape (n,).	required
`deviance`	`float \| ndarray`	Residual deviance (scalar).	required
`dispersion`	`float \| ndarray`	Dispersion parameter estimate (scalar).	required
`family`	`Family`	Family object with name attribute.	required
`n`	`int`	Number of observations.	required
`xp`	`Any`	Array module (numpy or jax.numpy).	required

Returns:

Type	Description
`ndarray \| float`	Log-likelihood as a scalar.

compute_null_deviance¶

compute_null_deviance(y: np.ndarray, family: Family, xp: Any) -> np.ndarray | float

Compute null deviance for the intercept-only model.

The null deviance measures how well the response is predicted by a model with only an intercept (the mean). It is used as a baseline for assessing model fit improvement.

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values, shape (n,).	required
`family`	`Family`	Family object with deviance function.	required
`xp`	`Any`	Array module (numpy or jax.numpy).	required

Returns:

Type	Description
`ndarray \| float`	Null deviance as a scalar.

fit_glm_irls¶

fit_glm_irls(y: np.ndarray, X: np.ndarray, family: Family, weights: np.ndarray | None = None, max_iter: int = 25, tol: float = 1e-08) -> dict

Fit GLM using IRLS algorithm.

Implements Iteratively Reweighted Least Squares for generalized linear models. The algorithm iterates until convergence (relative deviance change < tol) or max_iter is reached.

The core fitting loop is JIT-compiled for efficiency. All family-specific operations (link, variance, deviance) are pure JAX functions from the Family object.

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values (n,)	required
`X`	`ndarray`	Design matrix (n, p)	required
`family`	`Family`	Family object with link, variance, and deviance functions	required
`weights`	`ndarray \| None`	Prior observation weights (n,) or None for equal weights	`None`
`max_iter`	`int`	Maximum IRLS iterations (default: 25)	`25`
`tol`	`float`	Convergence tolerance for relative deviance change (default: 1e-8)	`1e-08`

Returns:

Type Description

Type	Description
`dict`	Dictionary with keys: - coef: Coefficient estimates (p,) - vcov: Variance-covariance matrix (p, p) - dispersion: Dispersion parameter estimate - fitted: Fitted values μ on response scale (n,) - linear_predictor: Linear predictor η on link scale (n,) - residuals: Response residuals y - μ (n,) - deviance: Residual deviance - null_deviance: Null model deviance - df_residual: Residual degrees of freedom - loglik: Log-likelihood - converged: True if converged, False otherwise - n_iter: Number of iterations performed - has_separation: Separation detection (binomial only, None otherwise) - irls_weights: Final IRLS weights (n,)

Notes:

Uses while_loop for early stopping on convergence
Clips IRLS weights to [1e-10, 1e10] for numerical stability
Clips μ values based on family type (binomial: [1e-10, 1-1e-10], poisson: [1e-10, inf])
Adds small regularization (1e-10) to diagonal of XtWX
Separation detection checks if fitted probabilities are extremely close to 0 or 1 (< 1e-6 or > 1-1e-6) for binomial models

glmer¶

PIRLS (Penalized Iteratively Reweighted Least Squares) algorithm for GLMMs.

This module implements the PIRLS algorithm for fitting generalized linear mixed models using Laplace approximation. The algorithm combines IRLS (from GLMs) with penalized least squares (from LMMs).

pirls_sparse.py: Core PIRLS inner-loop routines (IRLS quantities,
pirls_sparse.py: Core PIRLS inner-loop routines (IRLS quantities, weighted PLS, deviance, log-likelihood, PIRLS iteration).
heuristics.py: Initialization heuristics, bounds, finite-difference derivatives, Hessian-based vcov.
glmer.py (this file): Orchestrator — deviance objectives and the top-level fit_glmm_pirls driver.

PIRLS Algorithm:¶

For fixed variance components theta: 1. Initialize eta from GLM fit 2. Until convergence: a. Compute working weights: w = 1 / (V(mu) * (deta/dmu)^2) b. Compute working response: z = eta + (y - mu) * deta/dmu c. Solve weighted penalized least squares: [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I] [u] = [L’Z’Wz] d. Update: eta = Xbeta + ZL*u e. Check convergence on |delta_eta|

GLMM Deviance (Laplace approximation):¶

-2 log L ~ sum(dev_resid) + log|L'Z'WZL + I| + ||u||^2

sum(dev_resid) = sum of GLM deviance residuals
sum(dev_resid) = sum of GLM deviance residuals
log|L’Z’WZL + I| = log-determinant from Cholesky
||u||^2 = penalty term for spherical random effects

Reference:¶

lme4 external.cpp: - pwrssUpdate (lines 308-375) MixedModels.jl generalizedlinearmixedmodel.jl: - pirls! function (lines 600-655) - deviance function (lines 84-109)

Functions:

Name	Description
`compute_glmm_loglik`	Compute GLMM log-likelihood via Laplace approximation.
`compute_hessian_vcov`	Compute Hessian-based vcov for fixed effects.
`compute_irls_quantities`	Compute IRLS working weights and working response.
`deriv12`	Compute gradient and Hessian using central finite differences.
`fit_glmm_pirls`	Fit GLMM using PIRLS with outer optimization over theta.
`get_theta_lower_bounds`	Get lower bounds for theta parameters.
`glmm_deviance`	Compute GLMM deviance via Laplace approximation.
`glmm_deviance_at_fixed_theta_beta`	Compute GLMM deviance at fixed (theta, beta) by solving only for u.
`glmm_deviance_objective`	Compute GLMM deviance for outer optimization.
`init_theta_glmm`	Initialize theta for GLMM fitting.
`pirls_sparse`	Penalized Iteratively Reweighted Least Squares using sparse operations.
`solve_weighted_pls_sparse`	Solve weighted Penalized Least Squares for GLMM.

Classes¶

Functions¶

compute_glmm_loglik¶

compute_glmm_loglik(y: np.ndarray, mu: np.ndarray, family: Family, sqrL: float, logdet: float, prior_weights: np.ndarray | None = None) -> float

Compute GLMM log-likelihood via Laplace approximation.

Matches lme4’s logLik() for GLMMs: loglik = cond_loglik - 0.5 * (||u||^2 + logdet)

where cond_loglik is the full conditional log-likelihood including normalizing constants (log(y!), log(choose(m,y)), etc.).

For weighted binomial (grouped), uses the full binomial log-likelihood with log(choose(m, y_counts)) to match lme4’s aic() function.

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values (n,). For weighted binomial, these are proportions.	required
`mu`	`ndarray`	Fitted values (n,).	required
`family`	`Family`	GLM family with loglik method.	required
`sqrL`	`float`	Sum of squared spherical random effects
`logdet`	`float`	Log-determinant log	L’Z’WZL + I
`prior_weights`	`ndarray \| None`	Prior weights (n,). For binomial proportions, the number of trials per observation.	`None`

Returns:

Type	Description
`float`	Log-likelihood (scalar).

compute_hessian_vcov¶

compute_hessian_vcov(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, delta: float = 0.0001, eta_init: np.ndarray | None = None) -> tuple[np.ndarray, dict]

Compute Hessian-based vcov for fixed effects.

Matches lme4’s use.hessian=TRUE approach:

Compute numerical Hessian of deviance w.r.t. [theta, beta]
Invert the Hessian
Extract beta-beta block
Symmetrize

This uses lme4’s deriv12() approach (simple central differences with delta=1e-4) for exact parity with R.

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Final theta (variance component) estimates	required
`beta`	`ndarray`	Final beta (fixed effect) estimates	required
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse	required
`y`	`ndarray`	Response vector (n,)	required
`family`	`Family`	GLM family (binomial, poisson)	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor	required
`re_structure`	`str`	Random effects structure type	required
`metadata`	`dict \| None`	Additional RE metadata	`None`
`prior_weights`	`ndarray \| None`	Prior observation weights	`None`
`pirls_max_iter`	`int`	Maximum PIRLS iterations per deviance evaluation	`30`
`pirls_tol`	`float`	PIRLS convergence tolerance	`1e-07`
`delta`	`float`	Finite difference step size (default 1e-4, matches lme4)	`0.0001`
`eta_init`	`ndarray \| None`	Initial linear predictor for PIRLS. If provided, used as starting point for all deviance evaluations. This provides numerical stability for models with large random effects.	`None`

Returns:

Type	Description
`tuple[ndarray, dict]`	Tuple of: - vcov_beta: Variance-covariance matrix for beta, shape (p, p) - derivs: Dictionary with “gradient” and “Hessian” (for diagnostics)

Notes: If the Hessian is not positive definite, this function returns the pseudo-inverse of the beta-beta block, with a warning.

compute_irls_quantities¶

compute_irls_quantities(y: np.ndarray, eta: np.ndarray, family: Family) -> tuple[np.ndarray, np.ndarray]

Compute IRLS working weights and working response.

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values (n,)	required
`eta`	`ndarray`	Linear predictor (n,)	required
`family`	`Family`	GLM family object	required

Returns:

Name	Type	Description
`working_weights`	`ndarray`	IRLS weights w = 1 / (V(mu) * (deta/dmu)^2)
`working_response`	`ndarray`	Pseudo-response z = eta + (y - mu) * deta/dmu

deriv12¶

deriv12(func: Callable[[np.ndarray], float], x: np.ndarray, delta: float = 0.0001, fx: float | None = None) -> dict

Compute gradient and Hessian using central finite differences.

This matches lme4’s deriv12() function exactly (lme4/R/deriv.R). Uses simple central differences with step size delta.

The Hessian is computed simultaneously with the gradient for efficiency:

Diagonal: H[j,j] = (f(x+d) - 2f(x) + f(x-d)) / d^2
Off-diag: H[i,j] = (f(x+di+dj) - f(x+di-dj) - f(x-di+dj) + f(x-di-dj)) / (4d^2)

Parameters:

Name	Type	Description	Default
`func`	`Callable[[ndarray], float]`	Scalar function R^n -> R to differentiate.	required
`x`	`ndarray`	Point at which to compute derivatives, shape (n,).	required
`delta`	`float`	Step size for finite differences (default 1e-4, matches lme4).	`0.0001`
`fx`	`float \| None`	Optional pre-computed f(x) to avoid redundant evaluation.	`None`

Returns:

Type	Description
`dict`	Dictionary with: - “gradient”: First derivative vector, shape (n,) - “Hessian”: Second derivative matrix, shape (n, n)

Note: This is O(h^2) accurate, matching lme4’s approach. For higher accuracy (O(h^8)), use compute_hessian_richardson from

fit_glmm_pirls¶

fit_glmm_pirls(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, theta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, max_outer_iter: int = 10000, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, two_stage: bool = True, lambda_template: 'LambdaTemplate | None' = None, nAGQ: int = 1, group_ids: np.ndarray | None = None) -> dict

Fit GLMM using PIRLS with outer optimization over theta.

Optimization strategy (matching lme4):

Stage 1: BOBYQA optimizes theta only, PIRLS solves for (beta, u) at each theta
Stage 2: Nelder-Mead optimizes [theta, beta] jointly, PIRLS only solves for u

The two-stage approach matches lme4 exactly:

Stage 1 (nAGQ=0): Optimize theta via BOBYQA, with PIRLS finding optimal (beta, u) for each theta.
Stage 2 (nAGQ>=1): Joint optimization of [theta, beta] via Nelder-Mead, where beta is held fixed in PIRLS (added to offset) so PIRLS only solves for u. This allows the optimizer to find the global (theta, beta) optimum.
When nAGQ > 1, Stage 2 uses adaptive Gauss-Hermite quadrature instead of the Laplace approximation for more accurate marginal likelihood.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse	required
`y`	`ndarray`	Response vector (n,)	required
`family`	`Family`	GLM family (binomial, poisson)	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor	required
`re_structure`	`str`	Random effects structure type	required
`metadata`	`dict \| None`	Additional RE metadata	`None`
`theta_init`	`ndarray \| None`	Initial theta values (optional)	`None`
`prior_weights`	`ndarray \| None`	Prior observation weights	`None`
`max_outer_iter`	`int`	Maximum BOBYQA iterations for Stage 1	`10000`
`pirls_max_iter`	`int`	Maximum PIRLS iterations per theta evaluation	`30`
`pirls_tol`	`float`	PIRLS convergence tolerance (1e-7 matches lme4)	`1e-07`
`verbose`	`bool`	Print optimization progress	`False`
`two_stage`	`bool`	Use two-stage optimization matching lme4 (default True)	`True`
`lambda_template`	`‘LambdaTemplate \| None’`	Optional pre-built template for efficient Lambda updates. If provided, uses this template instead of building a new one. This is useful for bootstrap where the same template can be reused.	`None`
`nAGQ`	`int`	Number of adaptive Gauss-Hermite quadrature points (default 1). nAGQ=1 uses the Laplace approximation. nAGQ > 1 uses AGQ for more accurate integration (scalar random intercept only).	`1`
`group_ids`	`ndarray \| None`	Group membership array (n,), integer-coded. Required when nAGQ > 1 for per-group quadrature.	`None`

Returns:

Type	Description
`dict`	Dictionary with:
`dict`	- theta: Optimized theta parameters
`dict`	- beta: Fixed effects coefficients
`dict`	- u: Spherical random effects
`dict`	- eta: Linear predictor
`dict`	- mu: Fitted values
`dict`	- deviance: Final GLMM deviance
`dict`	- loglik: Laplace log-likelihood
`dict`	- converged: Whether optimizer converged
`dict`	- n_outer_iter: Number of outer iterations
`dict`	- n_func_evals: Number of deviance evaluations
`dict`	- pirls_converged: Whether final PIRLS converged
`dict`	- pirls_n_iter: Final PIRLS iterations

get_theta_lower_bounds¶

get_theta_lower_bounds(n_theta: int, re_structure: str, metadata: dict | None = None) -> list[float]

Get lower bounds for theta parameters.

Diagonal elements of Cholesky factor must be non-negative. Off-diagonal elements are unbounded.

Parameters:

Name	Type	Description	Default
`n_theta`	`int`	Number of theta parameters	required
`re_structure`	`str`	Random effects structure type	required
`metadata`	`dict \| None`	Optional metadata dict with ‘re_structures_list’ for crossed/nested/mixed structures	`None`

Returns:

Type	Description
`list[float]`	List of lower bounds

glmm_deviance¶

glmm_deviance(y: np.ndarray, mu: np.ndarray, family: Family, logdet: float, sqrL: float, prior_weights: np.ndarray | None = None) -> float

Compute GLMM deviance via Laplace approximation.

deviance_GLMM = sum(dev_resid) + logdet + ||u||^2 deviance_GLMM = sum(dev_resid) + logdet + ||u||^2

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values (n,)	required
`mu`	`ndarray`	Fitted values mu = g^{-1}(eta) (n,)	required
`family`	`Family`	GLM family	required
`logdet`	`float`	Log-determinant log	L’Z’WZL + I
`sqrL`	`float`	Sum of squared spherical random effects
`prior_weights`	`ndarray \| None`	Prior weights for observations (n,). For binomial with proportions, this is the number of trials.	`None`

Returns:

Type	Description
`float`	GLMM deviance (scalar)

glmm_deviance_at_fixed_theta_beta¶

glmm_deviance_at_fixed_theta_beta(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, eta_init: np.ndarray | None = None, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None, pattern_template: 'PatternTemplate | None' = None) -> float

Compute GLMM deviance at fixed (theta, beta) by solving only for u.

This is the Stage 2 objective function for joint (theta, beta) optimization. Unlike glmm_deviance_objective which optimizes beta via PIRLS, this function holds beta fixed and only solves for the random effects u.

This matches lme4’s approach in Stage 2 where beta is added to the offset before PIRLS runs, so PIRLS only solves for u.

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major)	required
`beta`	`ndarray`	Fixed effects coefficients (p,) - held constant	required
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse	required
`y`	`ndarray`	Response vector (n,)	required
`family`	`Family`	GLM family (binomial, poisson)	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor	required
`re_structure`	`str`	Random effects structure type	required
`metadata`	`dict \| None`	Additional RE metadata	`None`
`prior_weights`	`ndarray \| None`	Prior observation weights	`None`
`pirls_max_iter`	`int`	Maximum PIRLS iterations per theta evaluation	`30`
`pirls_tol`	`float`	PIRLS convergence tolerance (1e-7 matches lme4)	`1e-07`
`verbose`	`bool`	Print verbose output	`False`
`eta_init`	`ndarray \| None`	Initial linear predictor for PIRLS. If provided, used as starting point instead of X @ beta. This matches lme4’s lp0 mechanism where the full linear predictor from Stage 1 (Xbeta + ZLambda*u) is saved and restored for Stage 2.	`None`
`lambda_template`	`‘LambdaTemplate \| None’`	Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call (~10-15% speedup).	`None`
`factor_cache`	`dict \| None`	Optional mutable dict for caching Cholesky factors across theta evaluations. Pass an empty dict {} to enable caching. Caches factor_S22 and factor_S to avoid expensive Cholesky symbolic analysis on each evaluation.	`None`
`pattern_template`	`‘PatternTemplate \| None’`	PatternTemplate for preserving sparsity patterns. Required for cross-theta factor caching. When provided, ensures S22 has consistent sparsity pattern even when theta hits boundaries (theta=0).	`None`

Returns:

Type	Description
`float`	GLMM deviance (scalar) at the exact (theta, beta) point

glmm_deviance_objective¶

glmm_deviance_objective(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None, pattern_template: 'PatternTemplate | None' = None, pirls_result_cache: dict | None = None) -> float

Compute GLMM deviance for outer optimization.

This is the objective function for BOBYQA optimization over theta.

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major)	required
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse	required
`y`	`ndarray`	Response vector (n,)	required
`family`	`Family`	GLM family (binomial, poisson)	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor	required
`re_structure`	`str`	Random effects structure type	required
`metadata`	`dict \| None`	Additional RE metadata	`None`
`prior_weights`	`ndarray \| None`	Prior observation weights	`None`
`pirls_max_iter`	`int`	Maximum PIRLS iterations per theta evaluation	`30`
`pirls_tol`	`float`	PIRLS convergence tolerance (1e-7 matches lme4)	`1e-07`
`verbose`	`bool`	Print verbose output	`False`
`lambda_template`	`‘LambdaTemplate \| None’`	Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call (~10-15% speedup).	`None`
`factor_cache`	`dict \| None`	Optional mutable dict for caching Cholesky factors across theta evaluations. Should contain keys ‘factor_S22’ and ‘factor_S’. Pass an empty dict {} to enable caching; factors will be stored and reused across calls, avoiding repeated symbolic analysis.	`None`
`pattern_template`	`‘PatternTemplate \| None’`	PatternTemplate for preserving sparsity patterns. Required for cross-theta factor caching. When provided, ensures S22 has consistent sparsity pattern even when theta hits boundaries (theta=0).	`None`
`pirls_result_cache`	`dict \| None`	Optional mutable dict for caching the last PIRLS result. When provided, stores `{"theta": theta_copy, "result": pirls_result}` on each call so the caller can reuse the result at the converged theta without a redundant PIRLS evaluation.	`None`

Returns:

Type	Description
`float`	GLMM deviance (scalar) - the Laplace approximation of -2*logLik

init_theta_glmm¶

init_theta_glmm(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> np.ndarray

Initialize theta for GLMM fitting.

Uses conservative defaults. Unlike LMMs, GLMMs work on the link scale so moment-based initialization is less useful.

Parameters:

Name	Type	Description	Default
`n_groups_list`	`list[int]`	Number of groups per grouping factor.	required
`re_structure`	`str`	Random effects structure type.	required
`metadata`	`dict \| None`	Optional metadata dict containing ‘random_names’ or ‘re_terms’ to determine correct number of theta parameters.	`None`

Returns:

Type	Description
`ndarray`	Initial theta values.

pirls_sparse¶

pirls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, family: Family, eta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, beta_fixed: np.ndarray | None = None, max_iter: int = 30, tol: float = 1e-07, verbose: bool = False, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dict

Penalized Iteratively Reweighted Least Squares using sparse operations.

For fixed variance parameters encoded in Lambda, iteratively solves for beta and u using weighted penalized least squares. This is the inner loop of GLMM optimization.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse	required
`Lambda`	`csc_matrix`	Cholesky-like factor from theta (q, q), scipy sparse	required
`y`	`ndarray`	Response vector (n,)	required
`family`	`Family`	GLM family (binomial, poisson)	required
`eta_init`	`ndarray \| None`	Initial linear predictor (optional)	`None`
`prior_weights`	`ndarray \| None`	Prior observation weights (n,). For binomial with proportions, this should be the number of trials.	`None`
`beta_fixed`	`ndarray \| None`	If provided, solve only for u with beta held fixed. This is used in Stage 2 optimization to evaluate deviance at arbitrary (theta, beta) points without re-optimizing beta.	`None`
`max_iter`	`int`	Maximum PIRLS iterations	`30`
`tol`	`float`	Convergence tolerance on relative deviance change (1e-7 matches lme4)	`1e-07`
`verbose`	`bool`	Print iteration info	`False`
`factor_S22`	`‘SparseFactorization \| None’`	Cached Cholesky factor for S22 block (L’Z’WZL + I). If provided, reuses symbolic analysis across calls. Useful for caching across theta evaluations in the optimizer.	`None`
`factor_S`	`‘SparseFactorization \| None’`	Cached Cholesky factor for full augmented system. Only used when beta_fixed is None (full solve mode).	`None`
`pattern_template`	`‘PatternTemplate \| None’`	PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. This is required for cross-theta factor caching to work correctly when theta has boundary values (theta=0).	`None`

Returns:

Type	Description
`dict`	Dictionary with keys:
`dict`	- beta: Fixed effects coefficients (p,)
`dict`	- u: Spherical random effects (q,)
`dict`	- eta: Final linear predictor (n,)
`dict`	- mu: Final fitted values (n,)
`dict`	- deviance: GLMM deviance at convergence
`dict`	- converged: Whether PIRLS converged
`dict`	- n_iter: Number of iterations
`dict`	- logdet_L: Log-determinant
`dict`	- factor_S22: Cholesky factor for S22 (for caching)
`dict`	- factor_S: Cholesky factor for full system (for caching, full mode only)

solve_weighted_pls_sparse¶

solve_weighted_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, z: np.ndarray, weights: np.ndarray, beta_fixed: np.ndarray | None = None, ZL: sp.csc_matrix | None = None, ZL_dense: np.ndarray | None = None, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dict

Solve weighted Penalized Least Squares for GLMM.

[X’WX X’WZL ] [beta] [X’Wz ] [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I ] [u] = [L’Z’Wz]

using sparse Cholesky factorization.

If beta_fixed is provided, solves only for u with beta held constant: [L’Z’WZL + I] [u] = [L’Z’W(z - X*beta)]

This is critical for Stage 2 optimization where we need to evaluate deviance at arbitrary (theta, beta) points without PIRLS re-optimizing beta.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse CSC	required
`Lambda`	`csc_matrix`	Cholesky-like factor (q, q), scipy sparse CSC	required
`z`	`ndarray`	Working response (n,)	required
`weights`	`ndarray`	IRLS weights (n,)	required
`beta_fixed`	`ndarray \| None`	If provided, solve only for u with beta fixed to this value.	`None`
`ZL`	`csc_matrix \| None`	Pre-computed Z @ Lambda (sparse). If provided, avoids redundant computation when called repeatedly with same Z and Lambda.	`None`
`ZL_dense`	`ndarray \| None`	Pre-computed `ZL.toarray()` (dense). If provided, avoids repeated sparse-to-dense materialization on each call.	`None`
`factor_S22`	`‘SparseFactorization \| None’`	Cached Cholesky factor for S22 block. If provided, uses cholesky_inplace() to update with new values (same sparsity pattern). This avoids expensive symbolic analysis on repeated calls.	`None`
`factor_S`	`‘SparseFactorization \| None’`	Cached Cholesky factor for full augmented system S. Only used in full solve mode (when beta_fixed is None).	`None`
`pattern_template`	`‘PatternTemplate \| None’`	PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. Required when using factor_S22 with cross-theta caching.	`None`

Returns:

Type	Description
`dict`	Dictionary with keys:
`dict`	- beta: Fixed effects coefficients (p,)
`dict`	- u: Spherical random effects (q,)
`dict`	- logdet_L: Log-determinant of (L’Z’WZL + I)
`dict`	- fitted_z: Fitted working response Xbeta + ZL*u
`dict`	- factor_S22: Cholesky factor for S22 (for caching)
`dict`	- factor_S: Cholesky factor for full system S (for caching, full mode only)

heuristics¶

Heuristics and numerical utilities for GLMM fitting.

This module contains initialization heuristics, bound computation, and finite-difference derivative routines used by the GLMM fitter in glmer.py.

Classes¶

Functions¶

compute_hessian_vcov¶

compute_hessian_vcov(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, delta: float = 0.0001, eta_init: np.ndarray | None = None) -> tuple[np.ndarray, dict]

Compute Hessian-based vcov for fixed effects.

Matches lme4’s use.hessian=TRUE approach:

Compute numerical Hessian of deviance w.r.t. [theta, beta]
Invert the Hessian
Extract beta-beta block
Symmetrize

This uses lme4’s deriv12() approach (simple central differences with delta=1e-4) for exact parity with R.

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Final theta (variance component) estimates	required
`beta`	`ndarray`	Final beta (fixed effect) estimates	required
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse	required
`y`	`ndarray`	Response vector (n,)	required
`family`	`Family`	GLM family (binomial, poisson)	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor	required
`re_structure`	`str`	Random effects structure type	required
`metadata`	`dict \| None`	Additional RE metadata	`None`
`prior_weights`	`ndarray \| None`	Prior observation weights	`None`
`pirls_max_iter`	`int`	Maximum PIRLS iterations per deviance evaluation	`30`
`pirls_tol`	`float`	PIRLS convergence tolerance	`1e-07`
`delta`	`float`	Finite difference step size (default 1e-4, matches lme4)	`0.0001`
`eta_init`	`ndarray \| None`	Initial linear predictor for PIRLS. If provided, used as starting point for all deviance evaluations. This provides numerical stability for models with large random effects.	`None`

Returns:

Type	Description
`tuple[ndarray, dict]`	Tuple of: - vcov_beta: Variance-covariance matrix for beta, shape (p, p) - derivs: Dictionary with “gradient” and “Hessian” (for diagnostics)

Notes: If the Hessian is not positive definite, this function returns the pseudo-inverse of the beta-beta block, with a warning.

deriv12¶

deriv12(func: Callable[[np.ndarray], float], x: np.ndarray, delta: float = 0.0001, fx: float | None = None) -> dict

Compute gradient and Hessian using central finite differences.

This matches lme4’s deriv12() function exactly (lme4/R/deriv.R). Uses simple central differences with step size delta.

The Hessian is computed simultaneously with the gradient for efficiency:

Diagonal: H[j,j] = (f(x+d) - 2f(x) + f(x-d)) / d^2
Off-diag: H[i,j] = (f(x+di+dj) - f(x+di-dj) - f(x-di+dj) + f(x-di-dj)) / (4d^2)

Parameters:

Name	Type	Description	Default
`func`	`Callable[[ndarray], float]`	Scalar function R^n -> R to differentiate.	required
`x`	`ndarray`	Point at which to compute derivatives, shape (n,).	required
`delta`	`float`	Step size for finite differences (default 1e-4, matches lme4).	`0.0001`
`fx`	`float \| None`	Optional pre-computed f(x) to avoid redundant evaluation.	`None`

Returns:

Type	Description
`dict`	Dictionary with: - “gradient”: First derivative vector, shape (n,) - “Hessian”: Second derivative matrix, shape (n, n)

Note: This is O(h^2) accurate, matching lme4’s approach. For higher accuracy (O(h^8)), use compute_hessian_richardson from

get_theta_lower_bounds¶

get_theta_lower_bounds(n_theta: int, re_structure: str, metadata: dict | None = None) -> list[float]

Get lower bounds for theta parameters.

Diagonal elements of Cholesky factor must be non-negative. Off-diagonal elements are unbounded.

Parameters:

Name	Type	Description	Default
`n_theta`	`int`	Number of theta parameters	required
`re_structure`	`str`	Random effects structure type	required
`metadata`	`dict \| None`	Optional metadata dict with ‘re_structures_list’ for crossed/nested/mixed structures	`None`

Returns:

Type	Description
`list[float]`	List of lower bounds

init_theta_glmm¶

init_theta_glmm(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> np.ndarray

Initialize theta for GLMM fitting.

Uses conservative defaults. Unlike LMMs, GLMMs work on the link scale so moment-based initialization is less useful.

Parameters:

Name	Type	Description	Default
`n_groups_list`	`list[int]`	Number of groups per grouping factor.	required
`re_structure`	`str`	Random effects structure type.	required
`metadata`	`dict \| None`	Optional metadata dict containing ‘random_names’ or ‘re_terms’ to determine correct number of theta parameters.	`None`

Returns:

Type	Description
`ndarray`	Initial theta values.

initialization¶

Moment-based initialization for LMM variance components.

Functions:

Name	Description
`compute_moment_init`	Compute moment-based starting values for theta.

Functions¶

compute_moment_init¶

compute_moment_init(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, group_ids_list: list[np.ndarray], n_groups_list: list[int], re_structure: str, X_re: np.ndarray | None = None, metadata: dict | None = None) -> np.ndarray

Compute moment-based starting values for theta.

Dispatches to appropriate initializer based on re_structure.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix, shape (n, p).	required
`Z`	`csc_matrix`	Random effects design matrix, shape (n, q), scipy sparse.	required
`y`	`ndarray`	Response vector, shape (n,).	required
`group_ids_list`	`list[ndarray]`	List of group ID arrays (one per grouping factor). For simple models: single array of shape (n,) with values 0 to n_groups-1 For crossed/nested: multiple arrays	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor.	required
`re_structure`	`str`	Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “crossed”, “nested”, “mixed”	required
`X_re`	`ndarray \| None`	Random effects design matrix, shape (n, r). For slopes: typically (n, 2) with columns [ones, predictor] Not needed for intercept-only models.	`None`
`metadata`	`dict \| None`	Optional metadata dict with structure information. For crossed/nested/mixed, should contain ‘re_structures_list’ specifying each factor’s structure (e.g., [‘slope’, ‘intercept’]).	`None`

Returns:

Type	Description
`ndarray`	Initial theta values, shape varies by structure:
`ndarray`	- Intercept: (1,) or (n_factors,) for crossed/nested
`ndarray`	- Slope: (3,) for 2x2 Cholesky
`ndarray`	- Diagonal: (k,) for k RE per group
`ndarray`	- Crossed/nested with mixed: concatenation per factor

Examples:

>>> import numpy as np
>>> import scipy.sparse as sp
>>> # Simple random intercept
>>> X = np.random.randn(100, 2)
>>> Z = sp.random(100, 10, density=0.1, format='csc')
>>> y = np.random.randn(100)
>>> group_ids = [np.repeat(np.arange(10), 10)]
>>> theta = compute_moment_init(X, Z, y, group_ids, [10], "intercept")
>>> theta.shape
(1,)

Notes:

Uses lme4’s variance decomposition when possible
Falls back to conservative defaults if estimation fails
Returns theta on relative scale (τ/σ)

lambda_builder¶

Lambda matrix construction from theta parameters.

This module is a re-export facade. Implementation lives in:

lambda_template: Template/pattern caching for efficient optimization
lambda_sparse: Sparse matrix construction and theta-Cholesky conversions

All public names are re-exported here for backward compatibility.

Classes:

Name	Description
`LambdaTemplate`	Template for efficient Lambda matrix updates during optimization.
`PatternTemplate`	Template for preserving sparsity patterns across theta evaluations.

Functions:

Name	Description
`build_lambda_sparse`	Build sparse block-diagonal Lambda matrix from theta.
`build_lambda_template`	Build a Lambda template for efficient repeated updates.
`build_pattern_template`	Build pattern template for cross-theta Cholesky caching.
`cholesky_block_to_theta`	Convert lower-triangular Cholesky block to theta vector.
`compute_s22_preserve_pattern`	Compute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity.
`compute_zl_preserve_pattern`	Compute Z @ Lambda while preserving the sparsity pattern of ZL_pattern.
`theta_to_cholesky_block`	Convert theta vector to lower-triangular Cholesky block.
`theta_to_variance_params`	Convert theta (Cholesky elements) to variance parameters.
`update_lambda_from_template`	Update Lambda matrix from template using new theta values.

Classes¶

LambdaTemplate¶

LambdaTemplate(template: sp.csc_matrix, theta_to_data_map: list[np.ndarray], re_structure: str, n_groups_list: list[int]) -> None

Template for efficient Lambda matrix updates during optimization.

Caches the sparsity pattern of Lambda so that repeated calls only update the data values, avoiding reconstruction of indices.

Attributes:

Name	Type	Description
`template`	`csc_matrix`	Sparse CSC matrix with correct structure but placeholder values.
`theta_to_data_map`	`list[ndarray]`	List mapping theta indices to positions in template.data. Each element is an int64 array of indices into template.data.
`re_structure`	`str`	The random effects structure type.
`n_groups_list`	`list[int]`	Number of groups per factor.

Attributes¶

n_groups_list¶

n_groups_list: list[int]

re_structure¶

re_structure: str

template¶

template: sp.csc_matrix

theta_to_data_map¶

theta_to_data_map: list[np.ndarray]

PatternTemplate¶

PatternTemplate(ZL_pattern: sp.csc_matrix, S22_pattern: sp.csc_matrix, Z: sp.csc_matrix, lambda_template: 'LambdaTemplate') -> None

Template for preserving sparsity patterns across theta evaluations.

Used to match lme4’s behavior where the sparsity pattern is fixed at initialization and never changes during optimization.

Attributes:

Name	Type	Description
`ZL_pattern`	`csc_matrix`	Sparsity pattern of Z @ Lambda (computed with theta=ones).
`S22_pattern`	`csc_matrix`	Sparsity pattern of ZL’ @ ZL + I (computed with theta=ones).
`Z`	`csc_matrix`	Reference to the original Z matrix for value computation.
`lambda_template`	`‘LambdaTemplate’`	Reference to the LambdaTemplate.

Attributes¶

S22_pattern¶

S22_pattern: sp.csc_matrix

Z¶

Z: sp.csc_matrix

ZL_pattern¶

ZL_pattern: sp.csc_matrix

lambda_template¶

lambda_template: 'LambdaTemplate'

Functions¶

build_lambda_sparse¶

build_lambda_sparse(theta: np.ndarray, n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> sp.csc_matrix

Build sparse block-diagonal Lambda matrix from theta.

Lambda is constructed based on the random effects structure:

Intercept: Diagonal with theta[i] repeated
Diagonal: Block-diagonal with diagonal blocks
Slope: Block-diagonal with full Cholesky blocks
Nested/Crossed: Concatenated blocks per grouping factor

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major).	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor. For simple models: [n_groups] For nested/crossed: [n_groups_factor1, n_groups_factor2, ...]	required
`re_structure`	`str`	Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed”	required
`metadata`	`dict \| None`	Optional metadata dict with structure information. May contain ‘re_structures_list’ for mixed models.	`None`

Returns:

Type	Description
`csc_matrix`	Sparse block-diagonal Lambda matrix, CSC format.

Examples:

>>> # Random intercept: (1|group)
>>> theta = np.array([1.5])
>>> Lambda = build_lambda_sparse(theta, [10], "intercept")
>>> Lambda.shape
(10, 10)

>>> # Correlated slopes: (1+x|group)
>>> theta = np.array([1.5, 0.3, 1.2])  # [L00, L10, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "slope")
>>> Lambda.shape
(36, 36)  # 18 groups x 2 RE per group

>>> # Uncorrelated slopes: (1+x||group)
>>> theta = np.array([1.5, 1.2])  # [L00, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "diagonal")
>>> Lambda.shape
(36, 36)

Notes:

Returns CSC format for compatibility with CHOLMOD
Blocks are arranged per lme4’s conventions
For diagonal structures, uses blocked layout (all intercepts, then slopes)

build_lambda_template¶

build_lambda_template(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> LambdaTemplate

Build a Lambda template for efficient repeated updates.

Creates a sparse matrix template with the correct sparsity pattern, plus a mapping from theta indices to data array positions.

Parameters:

Name	Type	Description	Default
`n_groups_list`	`list[int]`	Number of groups per grouping factor.	required
`re_structure`	`str`	Random effects structure type.	required
`metadata`	`dict \| None`	Optional metadata dict with structure information.	`None`

Returns:

Type	Description
`LambdaTemplate`	LambdaTemplate with template matrix and theta-to-data mapping.

Examples:

>>> # Create template for random intercept model
>>> template = build_lambda_template([18], "intercept")
>>> template.template.shape
(18, 18)

>>> # Update with new theta values
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda.data[0]
1.5

Notes:

Template is built once per model, reused during optimization
theta_to_data_map[i] gives indices in template.data for theta[i]

build_pattern_template¶

build_pattern_template(Z: sp.csc_matrix, lambda_template: LambdaTemplate) -> PatternTemplate

Build pattern template for cross-theta Cholesky caching.

Computes the “maximal” sparsity patterns of ZL and S22 using theta=ones. These patterns are used to ensure consistent sparsity across all theta evaluations, even when theta components are zero (boundary conditions).

This matches lme4’s approach where the sparsity pattern is fixed at model initialization and never changes during optimization.

Parameters:

Name	Type	Description	Default
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse CSC.	required
`lambda_template`	`LambdaTemplate`	LambdaTemplate with pattern information.	required

Returns:

Type	Description
`PatternTemplate`	PatternTemplate with ZL and S22 patterns.

Examples:

>>> Z = sp.random(100, 50, density=0.1, format='csc')
>>> lambda_tpl = build_lambda_template([50], "intercept")
>>> pattern_tpl = build_pattern_template(Z, lambda_tpl)
>>> # Now use pattern_tpl for pattern-preserved computations

cholesky_block_to_theta¶

cholesky_block_to_theta(L: np.ndarray) -> np.ndarray

Convert lower-triangular Cholesky block to theta vector.

Parameters:

Name	Type	Description	Default
`L`	`ndarray`	Lower triangular Cholesky block, shape (dim, dim).	required

Returns:

Type	Description
`ndarray`	Theta vector (lower triangle elements, column-major).

Examples:

>>> L = np.array([[1.5, 0.0], [0.3, 1.2]])
>>> theta = cholesky_block_to_theta(L)
>>> theta
array([1.5, 0.3, 1.2])

Notes:

Extracts lower triangle column-by-column
Inverse of theta_to_cholesky_block

compute_s22_preserve_pattern¶

compute_s22_preserve_pattern(ZL: sp.csc_matrix, W: sp.csc_matrix, S22_pattern: sp.csc_matrix) -> sp.csc_matrix

Compute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity.

Similar to compute_zl_preserve_pattern, this ensures S22 always has the same sparsity pattern regardless of theta values.

Parameters:

Name	Type	Description	Default
`ZL`	`csc_matrix`	Pattern-preserved ZL matrix from compute_zl_preserve_pattern.	required
`W`	`csc_matrix`	Diagonal weight matrix.	required
`S22_pattern`	`csc_matrix`	The “maximal” S22 pattern from build_pattern_template.	required

Returns:

Type	Description
`csc_matrix`	S22 with S22_pattern’s sparsity structure but current values.

compute_zl_preserve_pattern¶

compute_zl_preserve_pattern(Z: sp.csc_matrix, Lambda: sp.csc_matrix, ZL_pattern: sp.csc_matrix) -> sp.csc_matrix

Compute Z @ Lambda while preserving the sparsity pattern of ZL_pattern.

When theta has zeros (boundary), scipy’s Z @ Lambda prunes explicit zeros, changing the sparsity pattern. This function ensures the result always has the same pattern as ZL_pattern by adding explicit zeros where needed.

This is critical for CHOLMOD caching: the factorization reuses the symbolic analysis, which depends on having a consistent sparsity pattern.

Parameters:

Name	Type	Description	Default
`Z`	`csc_matrix`	Random effects design matrix (n, q).	required
`Lambda`	`csc_matrix`	Current Lambda matrix (may have zeros from boundary theta).	required
`ZL_pattern`	`csc_matrix`	The “maximal” ZL pattern from build_pattern_template.	required

Returns:

Type	Description
`csc_matrix`	ZL with ZL_pattern’s sparsity structure but current values.

Notes:

Pattern positions not in Z @ Lambda are filled with explicit zeros.
This matches lme4’s updateLamtUt() which preserves pattern via manual iteration.

theta_to_cholesky_block¶

theta_to_cholesky_block(theta: np.ndarray, dim: int) -> np.ndarray

Convert theta vector to lower-triangular Cholesky block.

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major). For 2x2: [L00, L10, L11], length = 3 For 3x3: [L00, L10, L11, L20, L21, L22], length = 6	required
`dim`	`int`	Dimension of the Cholesky block (2, 3, etc.).	required

Returns:

Type	Description
`ndarray`	Lower triangular Cholesky block, shape (dim, dim).

Examples:

>>> theta = np.array([1.5, 0.3, 1.2])
>>> L = theta_to_cholesky_block(theta, 2)
>>> L
array([[1.5, 0. ],
       [0.3, 1.2]])

Notes:

Fills lower triangle column-by-column (Fortran order)
Diagonal elements should be positive for valid Cholesky

theta_to_variance_params¶

theta_to_variance_params(theta: np.ndarray, is_diagonal: bool = False) -> tuple[np.ndarray, np.ndarray]

Convert theta (Cholesky elements) to variance parameters.

Computes standard deviations and correlations from Cholesky factor.

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major). For intercept: [L00], length = 1 For slopes (2D): [L00, L10, L11], length = 3 For diagonal (2D): [L00, L11], length = 2	required
`is_diagonal`	`bool`	If True, theta contains only diagonal elements (

Returns:

Type	Description
`ndarray`	Tuple of (standard_deviations, correlations):
`ndarray`	- standard_deviations: sqrt(diag(Sigma)), shape (n,)
`tuple[ndarray, ndarray]`	- correlations: off-diagonal correlations, shape (n*(n-1)/2,)

Examples:

>>> # Correlated slopes
>>> theta = np.array([1.5, 0.3, 1.2])
>>> sds, rhos = theta_to_variance_params(theta)
>>> sds  # sqrt(diag(L @ L.T))
array([1.5       , 1.23693169])
>>> rhos  # correlation between intercept and slope
array([0.24253563])

>>> # Uncorrelated slopes
>>> theta = np.array([1.5, 1.2])
>>> sds, rhos = theta_to_variance_params(theta, is_diagonal=True)
>>> sds
array([1.5, 1.2])
>>> rhos  # empty for diagonal structure
array([])

Notes:

For diagonal structures, SDs are just theta values
For full structures, computes Sigma = L @ L.T then extracts params

update_lambda_from_template¶

update_lambda_from_template(template: LambdaTemplate, theta: np.ndarray) -> sp.csc_matrix

Update Lambda matrix from template using new theta values.

Efficiently updates only the data values without rebuilding structure.

Parameters:

Name	Type	Description	Default
`template`	`LambdaTemplate`	LambdaTemplate created by build_lambda_template.	required
`theta`	`ndarray`	New theta values.	required

Returns:

Type	Description
`csc_matrix`	Updated sparse Lambda matrix in CSC format.

Examples:

>>> template = build_lambda_template([18], "intercept")
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda[0, 0]
1.5

Notes:

Much faster than build_lambda_sparse for repeated calls
Modifies template.data in place for maximum efficiency

lambda_sparse¶

Sparse Lambda matrix construction and theta-Cholesky conversions.

Lambda is the Cholesky-like factor of the random effects covariance matrix. It is block-diagonal, with blocks corresponding to grouping factors.

Theta Parameterization (lme4-compatible):¶

Theta contains elements of the lower triangular Cholesky factor L, where the random effects covariance is Sigma = sigma^2 * Lambda * Lambda’ and Lambda is built from theta.

Theta is on the relative scale (divided by sigma), following lme4’s convention.

Examples:

Random intercept (1|group): theta = [tau/sigma], dim = 1
Correlated slopes (1+x|group): theta = [L00/sigma, L10/sigma, L11/sigma], dim = 3
Uncorrelated slopes (1+x||group): theta = [L00/sigma, L11/sigma], dim = 2
Nested (1|school/class): theta = [tau1/sigma, tau2/sigma], dim = 2
Crossed (1|subject) + (1|item): theta = [tau1/sigma, tau2/sigma], dim = 2

During optimization, theta is constrained:

Diagonal elements: theta >= 0 (ensures PSD)
Off-diagonal elements: -inf < theta < inf

Functions:

Name	Description
`build_lambda_sparse`	Build sparse block-diagonal Lambda matrix from theta.
`cholesky_block_to_theta`	Convert lower-triangular Cholesky block to theta vector.
`theta_to_cholesky_block`	Convert theta vector to lower-triangular Cholesky block.
`theta_to_variance_params`	Convert theta (Cholesky elements) to variance parameters.

Functions¶

build_lambda_sparse¶

build_lambda_sparse(theta: np.ndarray, n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> sp.csc_matrix

Build sparse block-diagonal Lambda matrix from theta.

Lambda is constructed based on the random effects structure:

Intercept: Diagonal with theta[i] repeated
Diagonal: Block-diagonal with diagonal blocks
Slope: Block-diagonal with full Cholesky blocks
Nested/Crossed: Concatenated blocks per grouping factor

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major).	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor. For simple models: [n_groups] For nested/crossed: [n_groups_factor1, n_groups_factor2, ...]	required
`re_structure`	`str`	Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed”	required
`metadata`	`dict \| None`	Optional metadata dict with structure information. May contain ‘re_structures_list’ for mixed models.	`None`

Returns:

Type	Description
`csc_matrix`	Sparse block-diagonal Lambda matrix, CSC format.

Examples:

>>> # Random intercept: (1|group)
>>> theta = np.array([1.5])
>>> Lambda = build_lambda_sparse(theta, [10], "intercept")
>>> Lambda.shape
(10, 10)

>>> # Correlated slopes: (1+x|group)
>>> theta = np.array([1.5, 0.3, 1.2])  # [L00, L10, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "slope")
>>> Lambda.shape
(36, 36)  # 18 groups x 2 RE per group

>>> # Uncorrelated slopes: (1+x||group)
>>> theta = np.array([1.5, 1.2])  # [L00, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "diagonal")
>>> Lambda.shape
(36, 36)

Notes:

Returns CSC format for compatibility with CHOLMOD
Blocks are arranged per lme4’s conventions
For diagonal structures, uses blocked layout (all intercepts, then slopes)

cholesky_block_to_theta¶

cholesky_block_to_theta(L: np.ndarray) -> np.ndarray

Convert lower-triangular Cholesky block to theta vector.

Parameters:

Name	Type	Description	Default
`L`	`ndarray`	Lower triangular Cholesky block, shape (dim, dim).	required

Returns:

Type	Description
`ndarray`	Theta vector (lower triangle elements, column-major).

Examples:

>>> L = np.array([[1.5, 0.0], [0.3, 1.2]])
>>> theta = cholesky_block_to_theta(L)
>>> theta
array([1.5, 0.3, 1.2])

Notes:

Extracts lower triangle column-by-column
Inverse of theta_to_cholesky_block

theta_to_cholesky_block¶

theta_to_cholesky_block(theta: np.ndarray, dim: int) -> np.ndarray

Convert theta vector to lower-triangular Cholesky block.

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major). For 2x2: [L00, L10, L11], length = 3 For 3x3: [L00, L10, L11, L20, L21, L22], length = 6	required
`dim`	`int`	Dimension of the Cholesky block (2, 3, etc.).	required

Returns:

Type	Description
`ndarray`	Lower triangular Cholesky block, shape (dim, dim).

Examples:

>>> theta = np.array([1.5, 0.3, 1.2])
>>> L = theta_to_cholesky_block(theta, 2)
>>> L
array([[1.5, 0. ],
       [0.3, 1.2]])

Notes:

Fills lower triangle column-by-column (Fortran order)
Diagonal elements should be positive for valid Cholesky

theta_to_variance_params¶

theta_to_variance_params(theta: np.ndarray, is_diagonal: bool = False) -> tuple[np.ndarray, np.ndarray]

Convert theta (Cholesky elements) to variance parameters.

Computes standard deviations and correlations from Cholesky factor.

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major). For intercept: [L00], length = 1 For slopes (2D): [L00, L10, L11], length = 3 For diagonal (2D): [L00, L11], length = 2	required
`is_diagonal`	`bool`	If True, theta contains only diagonal elements (

Returns:

Type	Description
`ndarray`	Tuple of (standard_deviations, correlations):
`ndarray`	- standard_deviations: sqrt(diag(Sigma)), shape (n,)
`tuple[ndarray, ndarray]`	- correlations: off-diagonal correlations, shape (n*(n-1)/2,)

Examples:

>>> # Correlated slopes
>>> theta = np.array([1.5, 0.3, 1.2])
>>> sds, rhos = theta_to_variance_params(theta)
>>> sds  # sqrt(diag(L @ L.T))
array([1.5       , 1.23693169])
>>> rhos  # correlation between intercept and slope
array([0.24253563])

>>> # Uncorrelated slopes
>>> theta = np.array([1.5, 1.2])
>>> sds, rhos = theta_to_variance_params(theta, is_diagonal=True)
>>> sds
array([1.5, 1.2])
>>> rhos  # empty for diagonal structure
array([])

Notes:

For diagonal structures, SDs are just theta values
For full structures, computes Sigma = L @ L.T then extracts params

lambda_template¶

Lambda matrix templates and sparsity pattern preservation.

Provides LambdaTemplate and PatternTemplate for efficient repeated Lambda construction during optimization. Templates cache sparsity patterns so that only data values need updating when theta changes.

Pattern preservation functions ensure consistent sparsity across all theta evaluations, even at boundary conditions (theta=0), matching lme4’s behavior.

Classes:

Name	Description
`LambdaTemplate`	Template for efficient Lambda matrix updates during optimization.
`PatternTemplate`	Template for preserving sparsity patterns across theta evaluations.

Functions:

Name	Description
`build_lambda_template`	Build a Lambda template for efficient repeated updates.
`build_pattern_template`	Build pattern template for cross-theta Cholesky caching.
`compute_s22_preserve_pattern`	Compute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity.
`compute_zl_preserve_pattern`	Compute Z @ Lambda while preserving the sparsity pattern of ZL_pattern.
`infer_diagonal_dim`	Infer dimension for diagonal structure from metadata.
`infer_slope_dim`	Infer dimension for slope structure from metadata.
`update_lambda_from_template`	Update Lambda matrix from template using new theta values.

Classes¶

LambdaTemplate¶

LambdaTemplate(template: sp.csc_matrix, theta_to_data_map: list[np.ndarray], re_structure: str, n_groups_list: list[int]) -> None

Template for efficient Lambda matrix updates during optimization.

Caches the sparsity pattern of Lambda so that repeated calls only update the data values, avoiding reconstruction of indices.

Attributes:

Name	Type	Description
`template`	`csc_matrix`	Sparse CSC matrix with correct structure but placeholder values.
`theta_to_data_map`	`list[ndarray]`	List mapping theta indices to positions in template.data. Each element is an int64 array of indices into template.data.
`re_structure`	`str`	The random effects structure type.
`n_groups_list`	`list[int]`	Number of groups per factor.

Attributes¶

n_groups_list¶

n_groups_list: list[int]

re_structure¶

re_structure: str

template¶

template: sp.csc_matrix

theta_to_data_map¶

theta_to_data_map: list[np.ndarray]

PatternTemplate¶

PatternTemplate(ZL_pattern: sp.csc_matrix, S22_pattern: sp.csc_matrix, Z: sp.csc_matrix, lambda_template: 'LambdaTemplate') -> None

Template for preserving sparsity patterns across theta evaluations.

Used to match lme4’s behavior where the sparsity pattern is fixed at initialization and never changes during optimization.

Attributes:

Name	Type	Description
`ZL_pattern`	`csc_matrix`	Sparsity pattern of Z @ Lambda (computed with theta=ones).
`S22_pattern`	`csc_matrix`	Sparsity pattern of ZL’ @ ZL + I (computed with theta=ones).
`Z`	`csc_matrix`	Reference to the original Z matrix for value computation.
`lambda_template`	`‘LambdaTemplate’`	Reference to the LambdaTemplate.

Attributes¶

S22_pattern¶

S22_pattern: sp.csc_matrix

Z¶

Z: sp.csc_matrix

ZL_pattern¶

ZL_pattern: sp.csc_matrix

lambda_template¶

lambda_template: 'LambdaTemplate'

Functions¶

build_lambda_template¶

build_lambda_template(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> LambdaTemplate

Build a Lambda template for efficient repeated updates.

Creates a sparse matrix template with the correct sparsity pattern, plus a mapping from theta indices to data array positions.

Parameters:

Name	Type	Description	Default
`n_groups_list`	`list[int]`	Number of groups per grouping factor.	required
`re_structure`	`str`	Random effects structure type.	required
`metadata`	`dict \| None`	Optional metadata dict with structure information.	`None`

Returns:

Type	Description
`LambdaTemplate`	LambdaTemplate with template matrix and theta-to-data mapping.

Examples:

>>> # Create template for random intercept model
>>> template = build_lambda_template([18], "intercept")
>>> template.template.shape
(18, 18)

>>> # Update with new theta values
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda.data[0]
1.5

Notes:

Template is built once per model, reused during optimization
theta_to_data_map[i] gives indices in template.data for theta[i]

build_pattern_template¶

build_pattern_template(Z: sp.csc_matrix, lambda_template: LambdaTemplate) -> PatternTemplate

Build pattern template for cross-theta Cholesky caching.

This matches lme4’s approach where the sparsity pattern is fixed at model initialization and never changes during optimization.

Parameters:

Name	Type	Description	Default
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse CSC.	required
`lambda_template`	`LambdaTemplate`	LambdaTemplate with pattern information.	required

Returns:

Type	Description
`PatternTemplate`	PatternTemplate with ZL and S22 patterns.

Examples:

>>> Z = sp.random(100, 50, density=0.1, format='csc')
>>> lambda_tpl = build_lambda_template([50], "intercept")
>>> pattern_tpl = build_pattern_template(Z, lambda_tpl)
>>> # Now use pattern_tpl for pattern-preserved computations

compute_s22_preserve_pattern¶

compute_s22_preserve_pattern(ZL: sp.csc_matrix, W: sp.csc_matrix, S22_pattern: sp.csc_matrix) -> sp.csc_matrix

Compute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity.

Similar to compute_zl_preserve_pattern, this ensures S22 always has the same sparsity pattern regardless of theta values.

Parameters:

Name	Type	Description	Default
`ZL`	`csc_matrix`	Pattern-preserved ZL matrix from compute_zl_preserve_pattern.	required
`W`	`csc_matrix`	Diagonal weight matrix.	required
`S22_pattern`	`csc_matrix`	The “maximal” S22 pattern from build_pattern_template.	required

Returns:

Type	Description
`csc_matrix`	S22 with S22_pattern’s sparsity structure but current values.

compute_zl_preserve_pattern¶

compute_zl_preserve_pattern(Z: sp.csc_matrix, Lambda: sp.csc_matrix, ZL_pattern: sp.csc_matrix) -> sp.csc_matrix

Compute Z @ Lambda while preserving the sparsity pattern of ZL_pattern.

This is critical for CHOLMOD caching: the factorization reuses the symbolic analysis, which depends on having a consistent sparsity pattern.

Parameters:

Name	Type	Description	Default
`Z`	`csc_matrix`	Random effects design matrix (n, q).	required
`Lambda`	`csc_matrix`	Current Lambda matrix (may have zeros from boundary theta).	required
`ZL_pattern`	`csc_matrix`	The “maximal” ZL pattern from build_pattern_template.	required

Returns:

Type	Description
`csc_matrix`	ZL with ZL_pattern’s sparsity structure but current values.

Notes:

Pattern positions not in Z @ Lambda are filled with explicit zeros.
This matches lme4’s updateLamtUt() which preserves pattern via manual iteration.

infer_diagonal_dim¶

infer_diagonal_dim(metadata: dict) -> int

Infer dimension for diagonal structure from metadata.

Parameters:

Name	Type	Description	Default
`metadata`	`dict`	Metadata dict possibly containing ‘re_terms’ or ‘random_names’.	required

Returns:

Type	Description
`int`	Dimension of the diagonal structure.

infer_slope_dim¶

infer_slope_dim(metadata: dict) -> int

Infer dimension for slope structure from metadata.

Parameters:

Name	Type	Description	Default
`metadata`	`dict`	Metadata dict possibly containing ‘re_terms’ or ‘random_names’.	required

Returns:

Type	Description
`int`	Dimension of the slope structure.

update_lambda_from_template¶

update_lambda_from_template(template: LambdaTemplate, theta: np.ndarray) -> sp.csc_matrix

Update Lambda matrix from template using new theta values.

Efficiently updates only the data values without rebuilding structure.

Parameters:

Name	Type	Description	Default
`template`	`LambdaTemplate`	LambdaTemplate created by build_lambda_template.	required
`theta`	`ndarray`	New theta values.	required

Returns:

Type	Description
`csc_matrix`	Updated sparse Lambda matrix in CSC format.

Examples:

>>> template = build_lambda_template([18], "intercept")
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda[0, 0]
1.5

Notes:

Much faster than build_lambda_sparse for repeated calls
Modifies template.data in place for maximum efficiency

lmer¶

Core LMM operations: PLS solving and deviance computation.

This module implements the Penalized Least Squares (PLS) formulation for linear mixed models, matching lme4’s canonical approach.

minimize: ||y - Xβ - ZΛu||² + ||u||² minimize: ||y - Xβ - ZΛu||² + ||u||²

where:

X is the fixed effects design matrix
Z is the random effects design matrix
Λ is the Cholesky-like factor from theta parameters
u are spherical random effects (u ~ N(0, I))

Architecture:

Uses scipy.sparse for sparse matrices and scikit-sparse for CHOLMOD
Dense linear algebra via backend (JAX JIT or NumPy), dispatched through _jit_cache
Returns NumPy arrays for downstream processing

lme4 factors only S22 = Λ’Z’ZΛ + I (not the full augmented system) lme4 factors only S22 = Λ’Z’ZΛ + I (not the full augmented system) and uses the Schur complement to solve for beta. See:

lme4/src/predModule.cpp lines 263-301 (updateDecomp)
lme4/src/predModule.cpp lines 189-203 (solve)

Classes:

Name	Description
`PLSInvariants`	Pre-computed quantities that don’t change during theta optimization.

Functions:

Name	Description
`apply_sqrt_weights`	Apply sqrt(weights) transformation to design matrices and response.
`compute_pls_invariants`	Pre-compute quantities that are constant during optimization.
`lmm_deviance_sparse`	Compute LMM deviance for optimization.
`solve_pls_sparse`	Solve Penalized Least Squares system using Schur complement.

Classes¶

PLSInvariants¶

PLSInvariants(XtX: np.ndarray, Xty: np.ndarray, X_backend: np.ndarray, y_backend: np.ndarray) -> None

Pre-computed quantities that don’t change during theta optimization.

These are computed once when the model is set up and reused for every deviance evaluation during BOBYQA optimization.

Attributes:

Name	Type	Description
`XtX`	`ndarray`	Gram matrix X’X, shape (p, p). Backend array (JAX or NumPy).
`Xty`	`ndarray`	Cross-product X’y, shape (p,). Backend array.
`X_backend`	`ndarray`	Fixed effects design matrix as backend array, shape (n, p).
`y_backend`	`ndarray`	Response vector as backend array, shape (n,).

Attributes¶

X_backend¶

X_backend: np.ndarray

XtX¶

XtX: np.ndarray

Xty¶

Xty: np.ndarray

y_backend¶

y_backend: np.ndarray

Functions¶

apply_sqrt_weights¶

apply_sqrt_weights(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, weights: np.ndarray | None) -> tuple[np.ndarray, sp.csc_matrix, np.ndarray, np.ndarray | None]

Apply sqrt(weights) transformation to design matrices and response.

Transforms the weighted LMM problem into an unweighted problem on transformed data. This is the standard approach from lme4 and MixedModels.jl.

minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||² minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||²

After transformation (X_w, Z_w, y_w = W^{1/2}X, W^{1/2}Z, W^{1/2}y): minimize ||y_w - X_w·β - Z_w·Λu||² + ||u||²

This is the standard unweighted PLS, so the same solver applies.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix, shape (n, p).	required
`Z`	`csc_matrix`	Random effects design matrix, sparse csc, shape (n, q).	required
`y`	`ndarray`	Response vector, shape (n,).	required
`weights`	`ndarray \| None`	Observation weights, shape (n,), or None for unweighted.	required

Returns:

Type	Description
`tuple[ndarray, csc_matrix, ndarray, ndarray \| None]`	Tuple of (X_w, Z_w, y_w, sqrtwts) where: X_w: Transformed X, shape (n, p) Z_w: Transformed Z, sparse csc, shape (n, q) y_w: Transformed y, shape (n,) sqrtwts: sqrt(weights), shape (n,), or None if weights is None

compute_pls_invariants¶

compute_pls_invariants(X: np.ndarray, y: np.ndarray) -> PLSInvariants

Pre-compute quantities that are constant during optimization.

Call this once before the optimization loop to avoid redundant computation. X’X and X’y don’t depend on theta, so computing them once saves work.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix, shape (n, p).	required
`y`	`ndarray`	Response vector, shape (n,).	required

Returns:

Type	Description
`PLSInvariants`	PLSInvariants with pre-computed X’X, X’y, and backend arrays.

Examples:

>>> X = np.random.randn(100, 3)
>>> y = np.random.randn(100)
>>> inv = compute_pls_invariants(X, y)
>>> inv.XtX.shape
(3, 3)

lmm_deviance_sparse¶

lmm_deviance_sparse(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, n_groups_list: list[int], re_structure: str, method: str = 'REML', lambda_template: 'LambdaTemplate | None' = None, pls_invariants: PLSInvariants | None = None, metadata: dict | None = None, sqrtwts: np.ndarray | None = None) -> float

Compute LMM deviance for optimization.

The deviance is -2 * log-likelihood, profiled over β and σ².

-2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n -2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n

-2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p) -2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p)

For weighted models, the deviance includes an additional adjustment: -log|W| = -2*sum(log(sqrtwts))

This follows lme4 and MixedModels.jl: X, Z, y should already be pre-transformed by sqrt(weights) using apply_sqrt_weights().

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (lower triangle, column-major).	required
`X`	`ndarray`	Fixed effects design matrix, shape (n, p). Should be pre-transformed by sqrt(weights) if weighted.	required
`Z`	`csc_matrix`	Random effects design matrix, shape (n, q), scipy sparse. Should be pre-transformed by sqrt(weights) if weighted.	required
`y`	`ndarray`	Response vector, shape (n,). Should be pre-transformed by sqrt(weights) if weighted.	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor.	required
`re_structure`	`str`	Random effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed”	required
`method`	`str`	Estimation method, either “ML” or “REML”.	`‘REML’`
`lambda_template`	`‘LambdaTemplate \| None’`	Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call.	`None`
`pls_invariants`	`PLSInvariants \| None`	Optional pre-computed invariants (X’X, X’y, JAX arrays). If provided, avoids redundant matrix computation in optimization loop. Use compute_pls_invariants() to create this before the loop.	`None`
`metadata`	`dict \| None`	Optional metadata dict with structure information. Required for crossed/nested/mixed structures with non-intercept factors. Should contain ‘re_structures_list’ specifying each factor’s structure.	`None`
`sqrtwts`	`ndarray \| None`	Optional sqrt(weights), shape (n,). If provided, adds the log-determinant adjustment -2*sum(log(sqrtwts)) to the deviance. This compensates for the Jacobian from the weight transformation.	`None`

Returns:

Type	Description
`float`	Deviance value (scalar).

Examples:

>>> import numpy as np
>>> import scipy.sparse as sp
>>> # Simple random intercept model
>>> theta = np.array([1.5])
>>> X = np.random.randn(100, 2)
>>> Z = sp.random(100, 10, density=0.1, format='csc')
>>> y = np.random.randn(100)
>>> dev = lmm_deviance_sparse(theta, X, Z, y, [10], "intercept", "REML")
>>> dev > 0
True

Notes:

Requires lambda_builder module for building Λ from theta
Returns ML deviance if method=“ML”, REML criterion if method=“REML”
Used as objective function in BOBYQA optimization
Pass lambda_template for ~10-20% speedup in optimization loop
Pass pls_invariants for additional speedup (avoids X’X, X’y computation)
For weighted models, pass sqrtwts and pre-transformed X, Z, y

solve_pls_sparse¶

solve_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, pls_invariants: PLSInvariants | None = None) -> dict

Solve Penalized Least Squares system using Schur complement.

Uses lme4’s algorithm: factor only S22 = Λ’Z’ZΛ + I, then use Schur complement to solve for beta. This avoids factoring the full augmented system.

ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k groups), this uses ~50MB instead of ~2.4GB per iteration.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix, shape (n, p).	required
`Z`	`csc_matrix`	Random effects design matrix, shape (n, q), scipy sparse CSC.	required
`Lambda`	`csc_matrix`	Cholesky-like factor, shape (q, q), scipy sparse CSC.	required
`y`	`ndarray`	Response vector, shape (n,).	required
`pls_invariants`	`PLSInvariants \| None`	Optional pre-computed invariants (X’X, X’y, backend arrays). If provided, avoids redundant computation in optimization loop. Use compute_pls_invariants() to create this before the loop.	`None`

Returns:

Type	Description
`dict`	Dictionary with keys:
`dict`	- beta: Fixed effects coefficients, shape (p,).
`dict`	- u: Spherical random effects, shape (q,).
`dict`	- logdet_L: Log-determinant of (Λ’Z’ZΛ + I).
`dict`	- rss: Residual sum of squares
`dict`	- logdet_RX: Log-determinant of Schur complement (for REML).

Examples:

>>> import numpy as np
>>> import scipy.sparse as sp
>>> n, p, q = 100, 3, 10
>>> X = np.random.randn(n, p)
>>> Z = sp.random(n, q, density=0.1, format='csc')
>>> Lambda = sp.eye(q, format='csc')
>>> y = np.random.randn(n)
>>> result = solve_pls_sparse(X, Z, Lambda, y)
>>> result['beta'].shape
(3,)
>>> result['u'].shape
(10,)

Notes:

Converts Z and Lambda to CSC format if needed (CHOLMOD requirement)
Uses CHOLMOD for numerically stable sparse Cholesky of S22
Dense linear algebra uses JIT-compiled functions (via _get_dense_ops)
ZΛ stays sparse; cross-products use efficient sparse-dense ops
Returns RSS without penalty term (PWRSS = RSS + ||u||²)

optimize¶

BOBYQA optimizer via NLOPT for variance component optimization.

Functions:

Name	Description
`optimize_theta`	Optimize theta using BOBYQA via NLOPT.

Functions¶

optimize_theta¶

optimize_theta(objective: Callable, theta0: np.ndarray, lower: np.ndarray, upper: np.ndarray, rhobeg: float = 0.002, rhoend: float = 2e-07, maxfun: int = 10000, verbose: bool = False) -> dict

Optimize theta using BOBYQA via NLOPT.

BOBYQA is Powell’s derivative-free trust-region method for bound-constrained optimization. This implementation uses NLOPT’s LN_BOBYQA algorithm.

Parameters:

Name	Type	Description	Default
`objective`	`Callable`	Function f(theta) -> scalar deviance to minimize.	required
`theta0`	`ndarray`	Initial parameter values (array-like).	required
`lower`	`ndarray`	Lower bounds for parameters.	required
`upper`	`ndarray`	Upper bounds for parameters.	required
`rhobeg`	`float`	Initial trust region radius (sets initial step size). Default 2e-3 (lme4’s default of 0.002).	`0.002`
`rhoend`	`float`	Final trust region radius (used to set tolerances). Default 2e-7 (lme4 default).	`2e-07`
`maxfun`	`int`	Maximum function evaluations. Default 10000.	`10000`
`verbose`	`bool`	Print optimization progress.	`False`

Returns:

Type	Description
`dict`	dict with keys: - theta: Optimized parameters (np.ndarray) - fun: Final objective value (float) - converged: Whether optimization succeeded (bool) - n_evals: Number of function evaluations (int) - message: Status message (str)

Examples:

>>> def rosenbrock(theta):
...     x, y = theta
...     return (1 - x)**2 + 100 * (y - x**2)**2
>>> result = optimize_theta(
...     rosenbrock,
...     theta0=[0.0, 0.0],
...     lower=[-2.0, -2.0],
...     upper=[2.0, 2.0]
... )
>>> result['converged']
True
>>> np.allclose(result['theta'], [1.0, 1.0], atol=1e-3)
True

pirls_sparse¶

Sparse PIRLS (Penalized Iteratively Reweighted Least Squares) core routines.

This module contains the inner-loop routines for fitting generalized linear mixed models via the Laplace approximation. These are called by the orchestrator in glmer.py and are not intended for direct use.

lme4 external.cpp (pwrssUpdate), MixedModels.jl pirls! lme4 external.cpp (pwrssUpdate), MixedModels.jl pirls!

Classes¶

Functions¶

compute_glmm_loglik¶

compute_glmm_loglik(y: np.ndarray, mu: np.ndarray, family: Family, sqrL: float, logdet: float, prior_weights: np.ndarray | None = None) -> float

Compute GLMM log-likelihood via Laplace approximation.

Matches lme4’s logLik() for GLMMs: loglik = cond_loglik - 0.5 * (||u||^2 + logdet)

where cond_loglik is the full conditional log-likelihood including normalizing constants (log(y!), log(choose(m,y)), etc.).

For weighted binomial (grouped), uses the full binomial log-likelihood with log(choose(m, y_counts)) to match lme4’s aic() function.

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values (n,). For weighted binomial, these are proportions.	required
`mu`	`ndarray`	Fitted values (n,).	required
`family`	`Family`	GLM family with loglik method.	required
`sqrL`	`float`	Sum of squared spherical random effects
`logdet`	`float`	Log-determinant log	L’Z’WZL + I
`prior_weights`	`ndarray \| None`	Prior weights (n,). For binomial proportions, the number of trials per observation.	`None`

Returns:

Type	Description
`float`	Log-likelihood (scalar).

compute_irls_quantities¶

compute_irls_quantities(y: np.ndarray, eta: np.ndarray, family: Family) -> tuple[np.ndarray, np.ndarray]

Compute IRLS working weights and working response.

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values (n,)	required
`eta`	`ndarray`	Linear predictor (n,)	required
`family`	`Family`	GLM family object	required

Returns:

Name	Type	Description
`working_weights`	`ndarray`	IRLS weights w = 1 / (V(mu) * (deta/dmu)^2)
`working_response`	`ndarray`	Pseudo-response z = eta + (y - mu) * deta/dmu

glmm_deviance¶

glmm_deviance(y: np.ndarray, mu: np.ndarray, family: Family, logdet: float, sqrL: float, prior_weights: np.ndarray | None = None) -> float

Compute GLMM deviance via Laplace approximation.

deviance_GLMM = sum(dev_resid) + logdet + ||u||^2 deviance_GLMM = sum(dev_resid) + logdet + ||u||^2

Parameters:

Name	Type	Description	Default
`y`	`ndarray`	Response values (n,)	required
`mu`	`ndarray`	Fitted values mu = g^{-1}(eta) (n,)	required
`family`	`Family`	GLM family	required
`logdet`	`float`	Log-determinant log	L’Z’WZL + I
`sqrL`	`float`	Sum of squared spherical random effects
`prior_weights`	`ndarray \| None`	Prior weights for observations (n,). For binomial with proportions, this is the number of trials.	`None`

Returns:

Type	Description
`float`	GLMM deviance (scalar)

pirls_sparse¶

pirls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, family: Family, eta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, beta_fixed: np.ndarray | None = None, max_iter: int = 30, tol: float = 1e-07, verbose: bool = False, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dict

Penalized Iteratively Reweighted Least Squares using sparse operations.

For fixed variance parameters encoded in Lambda, iteratively solves for beta and u using weighted penalized least squares. This is the inner loop of GLMM optimization.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse	required
`Lambda`	`csc_matrix`	Cholesky-like factor from theta (q, q), scipy sparse	required
`y`	`ndarray`	Response vector (n,)	required
`family`	`Family`	GLM family (binomial, poisson)	required
`eta_init`	`ndarray \| None`	Initial linear predictor (optional)	`None`
`prior_weights`	`ndarray \| None`	Prior observation weights (n,). For binomial with proportions, this should be the number of trials.	`None`
`beta_fixed`	`ndarray \| None`	If provided, solve only for u with beta held fixed. This is used in Stage 2 optimization to evaluate deviance at arbitrary (theta, beta) points without re-optimizing beta.	`None`
`max_iter`	`int`	Maximum PIRLS iterations	`30`
`tol`	`float`	Convergence tolerance on relative deviance change (1e-7 matches lme4)	`1e-07`
`verbose`	`bool`	Print iteration info	`False`
`factor_S22`	`‘SparseFactorization \| None’`	Cached Cholesky factor for S22 block (L’Z’WZL + I). If provided, reuses symbolic analysis across calls. Useful for caching across theta evaluations in the optimizer.	`None`
`factor_S`	`‘SparseFactorization \| None’`	Cached Cholesky factor for full augmented system. Only used when beta_fixed is None (full solve mode).	`None`
`pattern_template`	`‘PatternTemplate \| None’`	PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. This is required for cross-theta factor caching to work correctly when theta has boundary values (theta=0).	`None`

Returns:

Type	Description
`dict`	Dictionary with keys:
`dict`	- beta: Fixed effects coefficients (p,)
`dict`	- u: Spherical random effects (q,)
`dict`	- eta: Final linear predictor (n,)
`dict`	- mu: Final fitted values (n,)
`dict`	- deviance: GLMM deviance at convergence
`dict`	- converged: Whether PIRLS converged
`dict`	- n_iter: Number of iterations
`dict`	- logdet_L: Log-determinant
`dict`	- factor_S22: Cholesky factor for S22 (for caching)
`dict`	- factor_S: Cholesky factor for full system (for caching, full mode only)

solve_weighted_pls_sparse¶

solve_weighted_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, z: np.ndarray, weights: np.ndarray, beta_fixed: np.ndarray | None = None, ZL: sp.csc_matrix | None = None, ZL_dense: np.ndarray | None = None, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dict

Solve weighted Penalized Least Squares for GLMM.

[X’WX X’WZL ] [beta] [X’Wz ] [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I ] [u] = [L’Z’Wz]

using sparse Cholesky factorization.

If beta_fixed is provided, solves only for u with beta held constant: [L’Z’WZL + I] [u] = [L’Z’W(z - X*beta)]

This is critical for Stage 2 optimization where we need to evaluate deviance at arbitrary (theta, beta) points without PIRLS re-optimizing beta.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Fixed effects design matrix (n, p)	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse CSC	required
`Lambda`	`csc_matrix`	Cholesky-like factor (q, q), scipy sparse CSC	required
`z`	`ndarray`	Working response (n,)	required
`weights`	`ndarray`	IRLS weights (n,)	required
`beta_fixed`	`ndarray \| None`	If provided, solve only for u with beta fixed to this value.	`None`
`ZL`	`csc_matrix \| None`	Pre-computed Z @ Lambda (sparse). If provided, avoids redundant computation when called repeatedly with same Z and Lambda.	`None`
`ZL_dense`	`ndarray \| None`	Pre-computed `ZL.toarray()` (dense). If provided, avoids repeated sparse-to-dense materialization on each call.	`None`
`factor_S22`	`‘SparseFactorization \| None’`	Cached Cholesky factor for S22 block. If provided, uses cholesky_inplace() to update with new values (same sparsity pattern). This avoids expensive symbolic analysis on repeated calls.	`None`
`factor_S`	`‘SparseFactorization \| None’`	Cached Cholesky factor for full augmented system S. Only used in full solve mode (when beta_fixed is None).	`None`
`pattern_template`	`‘PatternTemplate \| None’`	PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. Required when using factor_S22 with cross-theta caching.	`None`

Returns:

Type	Description
`dict`	Dictionary with keys:
`dict`	- beta: Fixed effects coefficients (p,)
`dict`	- u: Spherical random effects (q,)
`dict`	- logdet_L: Log-determinant of (L’Z’WZL + I)
`dict`	- fitted_z: Fitted working response Xbeta + ZL*u
`dict`	- factor_S22: Cholesky factor for S22 (for caching)
`dict`	- factor_S: Cholesky factor for full system S (for caching, full mode only)

quadrature¶

Adaptive Gauss-Hermite Quadrature (AGQ) for GLMM marginal likelihood.

This module implements nAGQ > 1 integration for scalar random intercept models. The AGQ deviance replaces the Laplace deviance in Stage 2 of GLMM fitting when higher accuracy is needed (few observations per group, extreme probabilities).

Restriction: nAGQ > 1 only works for models with a single scalar random intercept (e.g., (1|group)). This matches lme4’s restriction.

Math¶

For group i with conditional mode u_hat_i and conditional SD sigma_hat_i:

.. math::

u_{ij} = \hat{u}_i + \sqrt{2} \, \sigma_i \, z_j
\quad (\text{adaptive nodes})

h(u_{ij}) = \sum_{k \in i} \log f(y_k | \mu(\eta_{ij,k}))
            - \tfrac{1}{2} u_{ij}^2
\quad (\text{log integrand})

\log I_i = \log(\sigma_i)
          + \text{logsumexp}_j\!\bigl(
                \log(w_j / \sqrt{\pi}) + h(u_{ij}) + z_j^2
            \bigr)

\text{deviance} = -2 \sum_i \log I_i

Reference¶

Pinheiro & Bates (1995), Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model, JCGS 4(1):12-35.

lme4 source: glmFamily.cpp aic() and laplace().

Functions:

Name	Description
`compute_agq_deviance`	Compute AGQ deviance for Stage 2 optimization.

Classes¶

Functions¶

compute_agq_deviance¶

compute_agq_deviance(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, group_ids: np.ndarray, nAGQ: int, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, eta_init: np.ndarray | None = None, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None) -> float

Compute AGQ deviance for Stage 2 optimization.

This is the objective function for nAGQ > 1. It:

Runs PIRLS with beta_fixed to get conditional modes u_hat
Computes per-group conditional SDs from the S22 diagonal
Evaluates the AGQ integral using Gauss-Hermite quadrature
Returns -2 * sum_i(log_integral_i)

Only valid for scalar random intercept models (single grouping factor, intercept only).

Parameters:

Name	Type	Description	Default
`theta`	`ndarray`	Cholesky factor elements (scalar for random intercept).	required
`beta`	`ndarray`	Fixed effects coefficients (p,).	required
`X`	`ndarray`	Fixed effects design matrix (n, p).	required
`Z`	`csc_matrix`	Random effects design matrix (n, q), scipy sparse.	required
`y`	`ndarray`	Response vector (n,).	required
`family`	`Family`	GLM family (binomial, poisson).	required
`n_groups_list`	`list[int]`	Number of groups per grouping factor.	required
`re_structure`	`str`	Random effects structure type.	required
`group_ids`	`ndarray`	Group membership array (n,), integer-coded.	required
`nAGQ`	`int`	Number of quadrature points (must be > 1).	required
`metadata`	`dict \| None`	Additional RE metadata.	`None`
`prior_weights`	`ndarray \| None`	Prior observation weights.	`None`
`pirls_max_iter`	`int`	Maximum PIRLS iterations per evaluation.	`30`
`pirls_tol`	`float`	PIRLS convergence tolerance.	`1e-07`
`eta_init`	`ndarray \| None`	Initial linear predictor for PIRLS.	`None`
`lambda_template`	`‘LambdaTemplate \| None’`	Pre-built template for efficient Lambda updates.	`None`
`factor_cache`	`dict \| None`	Mutable dict for caching Cholesky factors.	`None`

Returns:

Type	Description
`float`	AGQ deviance (scalar) = `-2 * sum(log marginal likelihood)`.