Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Model-specific fitting algorithms (IRLS, PLS, PIRLS).

Classes:

NameDescription
LambdaTemplateTemplate for efficient Lambda matrix updates during optimization.
PLSInvariantsPre-computed quantities that don’t change during theta optimization.
PatternTemplateTemplate for preserving sparsity patterns across theta evaluations.

Functions:

NameDescription
apply_sqrt_weightsApply sqrt(weights) transformation to design matrices and response.
build_lambda_sparseBuild sparse block-diagonal Lambda matrix from theta.
build_lambda_templateBuild a Lambda template for efficient repeated updates.
compute_agq_devianceCompute AGQ deviance for Stage 2 optimization.
compute_irls_quantitiesCompute IRLS working weights and working response.
compute_pls_invariantsPre-compute quantities that are constant during optimization.
fit_glm_irlsFit GLM using IRLS algorithm.
fit_glmm_pirlsFit GLMM using PIRLS with outer optimization over theta.
glmm_devianceCompute GLMM deviance via Laplace approximation.
glmm_deviance_objectiveCompute GLMM deviance for outer optimization.
lmm_deviance_sparseCompute LMM deviance for optimization.
optimize_thetaOptimize theta using BOBYQA via NLOPT.
solve_pls_sparseSolve Penalized Least Squares system using Schur complement.
solve_weighted_pls_sparseSolve weighted Penalized Least Squares for GLMM.
theta_to_cholesky_blockConvert theta vector to lower-triangular Cholesky block.
update_lambda_from_templateUpdate Lambda matrix from template using new theta values.

Modules:

NameDescription
glmGLM IRLS fitting algorithm.
glmerPIRLS (Penalized Iteratively Reweighted Least Squares) algorithm for GLMMs.
heuristicsHeuristics and numerical utilities for GLMM fitting.
initializationMoment-based initialization for LMM variance components.
lambda_builderLambda matrix construction from theta parameters.
lambda_sparseSparse Lambda matrix construction and theta-Cholesky conversions.
lambda_templateLambda matrix templates and sparsity pattern preservation.
lmerCore LMM operations: PLS solving and deviance computation.
optimizeBOBYQA optimizer via NLOPT for variance component optimization.
pirls_sparseSparse PIRLS (Penalized Iteratively Reweighted Least Squares) core routines.
quadratureAdaptive Gauss-Hermite Quadrature (AGQ) for GLMM marginal likelihood.

Classes

LambdaTemplate

LambdaTemplate(template: sp.csc_matrix, theta_to_data_map: list[np.ndarray], re_structure: str, n_groups_list: list[int]) -> None

Template for efficient Lambda matrix updates during optimization.

Caches the sparsity pattern of Lambda so that repeated calls only update the data values, avoiding reconstruction of indices.

Attributes:

NameTypeDescription
templatecsc_matrixSparse CSC matrix with correct structure but placeholder values.
theta_to_data_maplist[ndarray]List mapping theta indices to positions in template.data. Each element is an int64 array of indices into template.data.
re_structurestrThe random effects structure type.
n_groups_listlist[int]Number of groups per factor.
Attributes
n_groups_list
n_groups_list: list[int]
re_structure
re_structure: str
template
template: sp.csc_matrix
theta_to_data_map
theta_to_data_map: list[np.ndarray]

PLSInvariants

PLSInvariants(XtX: np.ndarray, Xty: np.ndarray, X_backend: np.ndarray, y_backend: np.ndarray) -> None

Pre-computed quantities that don’t change during theta optimization.

These are computed once when the model is set up and reused for every deviance evaluation during BOBYQA optimization.

Attributes:

NameTypeDescription
XtXndarrayGram matrix X’X, shape (p, p). Backend array (JAX or NumPy).
XtyndarrayCross-product X’y, shape (p,). Backend array.
X_backendndarrayFixed effects design matrix as backend array, shape (n, p).
y_backendndarrayResponse vector as backend array, shape (n,).
Attributes
X_backend
X_backend: np.ndarray
XtX
XtX: np.ndarray
Xty
Xty: np.ndarray
y_backend
y_backend: np.ndarray

PatternTemplate

PatternTemplate(ZL_pattern: sp.csc_matrix, S22_pattern: sp.csc_matrix, Z: sp.csc_matrix, lambda_template: 'LambdaTemplate') -> None

Template for preserving sparsity patterns across theta evaluations.

This is critical for cross-theta Cholesky caching. When theta has boundary values (theta=0), the resulting matrices would normally have fewer non-zeros. This template ensures the sparsity pattern remains constant by storing explicit zeros where needed.

Used to match lme4’s behavior where the sparsity pattern is fixed at initialization and never changes during optimization.

Attributes:

NameTypeDescription
ZL_patterncsc_matrixSparsity pattern of Z @ Lambda (computed with theta=ones).
S22_patterncsc_matrixSparsity pattern of ZL’ @ ZL + I (computed with theta=ones).
Zcsc_matrixReference to the original Z matrix for value computation.
lambda_template‘LambdaTemplate’Reference to the LambdaTemplate.
Attributes
S22_pattern
S22_pattern: sp.csc_matrix
Z
Z: sp.csc_matrix
ZL_pattern
ZL_pattern: sp.csc_matrix
lambda_template
lambda_template: 'LambdaTemplate'

Functions

apply_sqrt_weights

apply_sqrt_weights(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, weights: np.ndarray | None) -> tuple[np.ndarray, sp.csc_matrix, np.ndarray, np.ndarray | None]

Apply sqrt(weights) transformation to design matrices and response.

Transforms the weighted LMM problem into an unweighted problem on transformed data. This is the standard approach from lme4 and MixedModels.jl.

minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||² minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||²

After transformation (X_w, Z_w, y_w = W^{1/2}X, W^{1/2}Z, W^{1/2}y): minimize ||y_w - X_w·β - Z_w·Λu||² + ||u||²

This is the standard unweighted PLS, so the same solver applies.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix, shape (n, p).required
Zcsc_matrixRandom effects design matrix, sparse csc, shape (n, q).required
yndarrayResponse vector, shape (n,).required
weightsndarray | NoneObservation weights, shape (n,), or None for unweighted.required

Returns:

TypeDescription
tuple[ndarray, csc_matrix, ndarray, ndarray | None]Tuple of (X_w, Z_w, y_w, sqrtwts) where: X_w: Transformed X, shape (n, p) Z_w: Transformed Z, sparse csc, shape (n, q) y_w: Transformed y, shape (n,) sqrtwts: sqrt(weights), shape (n,), or None if weights is None

build_lambda_sparse

build_lambda_sparse(theta: np.ndarray, n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> sp.csc_matrix

Build sparse block-diagonal Lambda matrix from theta.

Lambda is constructed based on the random effects structure:

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major).required
n_groups_listlist[int]Number of groups per grouping factor. For simple models: [n_groups] For nested/crossed: [n_groups_factor1, n_groups_factor2, ...]required
re_structurestrRandom effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed”required
metadatadict | NoneOptional metadata dict with structure information. May contain ‘re_structures_list’ for mixed models.None

Returns:

TypeDescription
csc_matrixSparse block-diagonal Lambda matrix, CSC format.

Examples:

>>> # Random intercept: (1|group)
>>> theta = np.array([1.5])
>>> Lambda = build_lambda_sparse(theta, [10], "intercept")
>>> Lambda.shape
(10, 10)
>>> # Correlated slopes: (1+x|group)
>>> theta = np.array([1.5, 0.3, 1.2])  # [L00, L10, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "slope")
>>> Lambda.shape
(36, 36)  # 18 groups x 2 RE per group
>>> # Uncorrelated slopes: (1+x||group)
>>> theta = np.array([1.5, 1.2])  # [L00, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "diagonal")
>>> Lambda.shape
(36, 36)

Notes:

build_lambda_template

build_lambda_template(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> LambdaTemplate

Build a Lambda template for efficient repeated updates.

Creates a sparse matrix template with the correct sparsity pattern, plus a mapping from theta indices to data array positions.

Parameters:

NameTypeDescriptionDefault
n_groups_listlist[int]Number of groups per grouping factor.required
re_structurestrRandom effects structure type.required
metadatadict | NoneOptional metadata dict with structure information.None

Returns:

TypeDescription
LambdaTemplateLambdaTemplate with template matrix and theta-to-data mapping.

Examples:

>>> # Create template for random intercept model
>>> template = build_lambda_template([18], "intercept")
>>> template.template.shape
(18, 18)
>>> # Update with new theta values
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda.data[0]
1.5

Notes:

compute_agq_deviance

compute_agq_deviance(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, group_ids: np.ndarray, nAGQ: int, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, eta_init: np.ndarray | None = None, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None) -> float

Compute AGQ deviance for Stage 2 optimization.

This is the objective function for nAGQ > 1. It:

  1. Runs PIRLS with beta_fixed to get conditional modes u_hat

  2. Computes per-group conditional SDs from the S22 diagonal

  3. Evaluates the AGQ integral using Gauss-Hermite quadrature

  4. Returns -2 * sum_i(log_integral_i)

Only valid for scalar random intercept models (single grouping factor, intercept only).

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (scalar for random intercept).required
betandarrayFixed effects coefficients (p,).required
XndarrayFixed effects design matrix (n, p).required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparse.required
yndarrayResponse vector (n,).required
familyFamilyGLM family (binomial, poisson).required
n_groups_listlist[int]Number of groups per grouping factor.required
re_structurestrRandom effects structure type.required
group_idsndarrayGroup membership array (n,), integer-coded.required
nAGQintNumber of quadrature points (must be > 1).required
metadatadict | NoneAdditional RE metadata.None
prior_weightsndarray | NonePrior observation weights.None
pirls_max_iterintMaximum PIRLS iterations per evaluation.30
pirls_tolfloatPIRLS convergence tolerance.1e-07
eta_initndarray | NoneInitial linear predictor for PIRLS.None
lambda_template‘LambdaTemplate | None’Pre-built template for efficient Lambda updates.None
factor_cachedict | NoneMutable dict for caching Cholesky factors.None

Returns:

TypeDescription
floatAGQ deviance (scalar) = -2 * sum(log marginal likelihood).

compute_irls_quantities

compute_irls_quantities(y: np.ndarray, eta: np.ndarray, family: Family) -> tuple[np.ndarray, np.ndarray]

Compute IRLS working weights and working response.

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values (n,)required
etandarrayLinear predictor (n,)required
familyFamilyGLM family objectrequired

Returns:

NameTypeDescription
working_weightsndarrayIRLS weights w = 1 / (V(mu) * (deta/dmu)^2)
working_responsendarrayPseudo-response z = eta + (y - mu) * deta/dmu

compute_pls_invariants

compute_pls_invariants(X: np.ndarray, y: np.ndarray) -> PLSInvariants

Pre-compute quantities that are constant during optimization.

Call this once before the optimization loop to avoid redundant computation. X’X and X’y don’t depend on theta, so computing them once saves work.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix, shape (n, p).required
yndarrayResponse vector, shape (n,).required

Returns:

TypeDescription
PLSInvariantsPLSInvariants with pre-computed X’X, X’y, and backend arrays.

Examples:

>>> X = np.random.randn(100, 3)
>>> y = np.random.randn(100)
>>> inv = compute_pls_invariants(X, y)
>>> inv.XtX.shape
(3, 3)

fit_glm_irls

fit_glm_irls(y: np.ndarray, X: np.ndarray, family: Family, weights: np.ndarray | None = None, max_iter: int = 25, tol: float = 1e-08) -> dict

Fit GLM using IRLS algorithm.

Implements Iteratively Reweighted Least Squares for generalized linear models. The algorithm iterates until convergence (relative deviance change < tol) or max_iter is reached.

The core fitting loop is JIT-compiled for efficiency. All family-specific operations (link, variance, deviance) are pure JAX functions from the Family object.

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values (n,)required
XndarrayDesign matrix (n, p)required
familyFamilyFamily object with link, variance, and deviance functionsrequired
weightsndarray | NonePrior observation weights (n,) or None for equal weightsNone
max_iterintMaximum IRLS iterations (default: 25)25
tolfloatConvergence tolerance for relative deviance change (default: 1e-8)1e-08

Returns:

TypeDescription
dictDictionary with keys: - coef: Coefficient estimates (p,) - vcov: Variance-covariance matrix (p, p) - dispersion: Dispersion parameter estimate - fitted: Fitted values μ on response scale (n,) - linear_predictor: Linear predictor η on link scale (n,) - residuals: Response residuals y - μ (n,) - deviance: Residual deviance - null_deviance: Null model deviance - df_residual: Residual degrees of freedom - loglik: Log-likelihood - converged: True if converged, False otherwise - n_iter: Number of iterations performed - has_separation: Separation detection (binomial only, None otherwise) - irls_weights: Final IRLS weights (n,)

Notes:

fit_glmm_pirls

fit_glmm_pirls(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, theta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, max_outer_iter: int = 10000, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, two_stage: bool = True, lambda_template: 'LambdaTemplate | None' = None, nAGQ: int = 1, group_ids: np.ndarray | None = None) -> dict

Fit GLMM using PIRLS with outer optimization over theta.

Optimization strategy (matching lme4):

The two-stage approach matches lme4 exactly:

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparserequired
yndarrayResponse vector (n,)required
familyFamilyGLM family (binomial, poisson)required
n_groups_listlist[int]Number of groups per grouping factorrequired
re_structurestrRandom effects structure typerequired
metadatadict | NoneAdditional RE metadataNone
theta_initndarray | NoneInitial theta values (optional)None
prior_weightsndarray | NonePrior observation weightsNone
max_outer_iterintMaximum BOBYQA iterations for Stage 110000
pirls_max_iterintMaximum PIRLS iterations per theta evaluation30
pirls_tolfloatPIRLS convergence tolerance (1e-7 matches lme4)1e-07
verboseboolPrint optimization progressFalse
two_stageboolUse two-stage optimization matching lme4 (default True)True
lambda_template‘LambdaTemplate | None’Optional pre-built template for efficient Lambda updates. If provided, uses this template instead of building a new one. This is useful for bootstrap where the same template can be reused.None
nAGQintNumber of adaptive Gauss-Hermite quadrature points (default 1). nAGQ=1 uses the Laplace approximation. nAGQ > 1 uses AGQ for more accurate integration (scalar random intercept only).1
group_idsndarray | NoneGroup membership array (n,), integer-coded. Required when nAGQ > 1 for per-group quadrature.None

Returns:

TypeDescription
dictDictionary with:
dict- theta: Optimized theta parameters
dict- beta: Fixed effects coefficients
dict- u: Spherical random effects
dict- eta: Linear predictor
dict- mu: Fitted values
dict- deviance: Final GLMM deviance
dict- loglik: Laplace log-likelihood
dict- converged: Whether optimizer converged
dict- n_outer_iter: Number of outer iterations
dict- n_func_evals: Number of deviance evaluations
dict- pirls_converged: Whether final PIRLS converged
dict- pirls_n_iter: Final PIRLS iterations

glmm_deviance

glmm_deviance(y: np.ndarray, mu: np.ndarray, family: Family, logdet: float, sqrL: float, prior_weights: np.ndarray | None = None) -> float

Compute GLMM deviance via Laplace approximation.

deviance_GLMM = sum(dev_resid) + logdet + ||u||^2 deviance_GLMM = sum(dev_resid) + logdet + ||u||^2

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values (n,)required
mundarrayFitted values mu = g^{-1}(eta) (n,)required
familyFamilyGLM familyrequired
logdetfloatLog-determinant logL’Z’WZL + I
sqrLfloatSum of squared spherical random effects
prior_weightsndarray | NonePrior weights for observations (n,). For binomial with proportions, this is the number of trials.None

Returns:

TypeDescription
floatGLMM deviance (scalar)

glmm_deviance_objective

glmm_deviance_objective(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None, pattern_template: 'PatternTemplate | None' = None, pirls_result_cache: dict | None = None) -> float

Compute GLMM deviance for outer optimization.

This is the objective function for BOBYQA optimization over theta.

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major)required
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparserequired
yndarrayResponse vector (n,)required
familyFamilyGLM family (binomial, poisson)required
n_groups_listlist[int]Number of groups per grouping factorrequired
re_structurestrRandom effects structure typerequired
metadatadict | NoneAdditional RE metadataNone
prior_weightsndarray | NonePrior observation weightsNone
pirls_max_iterintMaximum PIRLS iterations per theta evaluation30
pirls_tolfloatPIRLS convergence tolerance (1e-7 matches lme4)1e-07
verboseboolPrint verbose outputFalse
lambda_template‘LambdaTemplate | None’Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call (~10-15% speedup).None
factor_cachedict | NoneOptional mutable dict for caching Cholesky factors across theta evaluations. Should contain keys ‘factor_S22’ and ‘factor_S’. Pass an empty dict {} to enable caching; factors will be stored and reused across calls, avoiding repeated symbolic analysis.None
pattern_template‘PatternTemplate | None’PatternTemplate for preserving sparsity patterns. Required for cross-theta factor caching. When provided, ensures S22 has consistent sparsity pattern even when theta hits boundaries (theta=0).None
pirls_result_cachedict | NoneOptional mutable dict for caching the last PIRLS result. When provided, stores {"theta": theta_copy, "result": pirls_result} on each call so the caller can reuse the result at the converged theta without a redundant PIRLS evaluation.None

Returns:

TypeDescription
floatGLMM deviance (scalar) - the Laplace approximation of -2*logLik

lmm_deviance_sparse

lmm_deviance_sparse(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, n_groups_list: list[int], re_structure: str, method: str = 'REML', lambda_template: 'LambdaTemplate | None' = None, pls_invariants: PLSInvariants | None = None, metadata: dict | None = None, sqrtwts: np.ndarray | None = None) -> float

Compute LMM deviance for optimization.

The deviance is -2 * log-likelihood, profiled over β and σ².

-2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n -2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n

-2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p) -2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p)

For weighted models, the deviance includes an additional adjustment: -log|W| = -2*sum(log(sqrtwts))

This follows lme4 and MixedModels.jl: X, Z, y should already be pre-transformed by sqrt(weights) using apply_sqrt_weights().

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major).required
XndarrayFixed effects design matrix, shape (n, p). Should be pre-transformed by sqrt(weights) if weighted.required
Zcsc_matrixRandom effects design matrix, shape (n, q), scipy sparse. Should be pre-transformed by sqrt(weights) if weighted.required
yndarrayResponse vector, shape (n,). Should be pre-transformed by sqrt(weights) if weighted.required
n_groups_listlist[int]Number of groups per grouping factor.required
re_structurestrRandom effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed”required
methodstrEstimation method, either “ML” or “REML”.‘REML’
lambda_template‘LambdaTemplate | None’Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call.None
pls_invariantsPLSInvariants | NoneOptional pre-computed invariants (X’X, X’y, JAX arrays). If provided, avoids redundant matrix computation in optimization loop. Use compute_pls_invariants() to create this before the loop.None
metadatadict | NoneOptional metadata dict with structure information. Required for crossed/nested/mixed structures with non-intercept factors. Should contain ‘re_structures_list’ specifying each factor’s structure.None
sqrtwtsndarray | NoneOptional sqrt(weights), shape (n,). If provided, adds the log-determinant adjustment -2*sum(log(sqrtwts)) to the deviance. This compensates for the Jacobian from the weight transformation.None

Returns:

TypeDescription
floatDeviance value (scalar).

Examples:

>>> import numpy as np
>>> import scipy.sparse as sp
>>> # Simple random intercept model
>>> theta = np.array([1.5])
>>> X = np.random.randn(100, 2)
>>> Z = sp.random(100, 10, density=0.1, format='csc')
>>> y = np.random.randn(100)
>>> dev = lmm_deviance_sparse(theta, X, Z, y, [10], "intercept", "REML")
>>> dev > 0
True

Notes:

optimize_theta

optimize_theta(objective: Callable, theta0: np.ndarray, lower: np.ndarray, upper: np.ndarray, rhobeg: float = 0.002, rhoend: float = 2e-07, maxfun: int = 10000, verbose: bool = False) -> dict

Optimize theta using BOBYQA via NLOPT.

BOBYQA is Powell’s derivative-free trust-region method for bound-constrained optimization. This implementation uses NLOPT’s LN_BOBYQA algorithm.

Parameters:

NameTypeDescriptionDefault
objectiveCallableFunction f(theta) -> scalar deviance to minimize.required
theta0ndarrayInitial parameter values (array-like).required
lowerndarrayLower bounds for parameters.required
upperndarrayUpper bounds for parameters.required
rhobegfloatInitial trust region radius (sets initial step size). Default 2e-3 (lme4’s default of 0.002).0.002
rhoendfloatFinal trust region radius (used to set tolerances). Default 2e-7 (lme4 default).2e-07
maxfunintMaximum function evaluations. Default 10000.10000
verboseboolPrint optimization progress.False

Returns:

TypeDescription
dictdict with keys: - theta: Optimized parameters (np.ndarray) - fun: Final objective value (float) - converged: Whether optimization succeeded (bool) - n_evals: Number of function evaluations (int) - message: Status message (str)

Examples:

>>> def rosenbrock(theta):
...     x, y = theta
...     return (1 - x)**2 + 100 * (y - x**2)**2
>>> result = optimize_theta(
...     rosenbrock,
...     theta0=[0.0, 0.0],
...     lower=[-2.0, -2.0],
...     upper=[2.0, 2.0]
... )
>>> result['converged']
True
>>> np.allclose(result['theta'], [1.0, 1.0], atol=1e-3)
True

solve_pls_sparse

solve_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, pls_invariants: PLSInvariants | None = None) -> dict

Solve Penalized Least Squares system using Schur complement.

Uses lme4’s algorithm: factor only S22 = Λ’Z’ZΛ + I, then use Schur complement to solve for beta. This avoids factoring the full augmented system.

Algorithm (matching lme4/src/predModule.cpp): 1. Factor S22 = Λ’Z’ZΛ + I via CHOLMOD 2. Compute RZX = L^{-1} * Λ’Z’X (forward solve) 3. Compute Schur = X’X - RZX’ * RZX (rank update) 4. Solve for beta via Schur complement 5. Back-solve for u

ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k groups), this uses ~50MB instead of ~2.4GB per iteration.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix, shape (n, p).required
Zcsc_matrixRandom effects design matrix, shape (n, q), scipy sparse CSC.required
Lambdacsc_matrixCholesky-like factor, shape (q, q), scipy sparse CSC.required
yndarrayResponse vector, shape (n,).required
pls_invariantsPLSInvariants | NoneOptional pre-computed invariants (X’X, X’y, backend arrays). If provided, avoids redundant computation in optimization loop. Use compute_pls_invariants() to create this before the loop.None

Returns:

TypeDescription
dictDictionary with keys:
dict- beta: Fixed effects coefficients, shape (p,).
dict- u: Spherical random effects, shape (q,).
dict- logdet_L: Log-determinant of (Λ’Z’ZΛ + I).
dict- rss: Residual sum of squares
dict- logdet_RX: Log-determinant of Schur complement (for REML).

Examples:

>>> import numpy as np
>>> import scipy.sparse as sp
>>> n, p, q = 100, 3, 10
>>> X = np.random.randn(n, p)
>>> Z = sp.random(n, q, density=0.1, format='csc')
>>> Lambda = sp.eye(q, format='csc')
>>> y = np.random.randn(n)
>>> result = solve_pls_sparse(X, Z, Lambda, y)
>>> result['beta'].shape
(3,)
>>> result['u'].shape
(10,)

Notes:

solve_weighted_pls_sparse

solve_weighted_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, z: np.ndarray, weights: np.ndarray, beta_fixed: np.ndarray | None = None, ZL: sp.csc_matrix | None = None, ZL_dense: np.ndarray | None = None, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dict

Solve weighted Penalized Least Squares for GLMM.

[X’WX X’WZL ] [beta] [X’Wz ] [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I ] [u] = [L’Z’Wz]

using sparse Cholesky factorization.

If beta_fixed is provided, solves only for u with beta held constant: [L’Z’WZL + I] [u] = [L’Z’W(z - X*beta)]

This is critical for Stage 2 optimization where we need to evaluate deviance at arbitrary (theta, beta) points without PIRLS re-optimizing beta.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparse CSCrequired
Lambdacsc_matrixCholesky-like factor (q, q), scipy sparse CSCrequired
zndarrayWorking response (n,)required
weightsndarrayIRLS weights (n,)required
beta_fixedndarray | NoneIf provided, solve only for u with beta fixed to this value.None
ZLcsc_matrix | NonePre-computed Z @ Lambda (sparse). If provided, avoids redundant computation when called repeatedly with same Z and Lambda.None
ZL_densendarray | NonePre-computed ZL.toarray() (dense). If provided, avoids repeated sparse-to-dense materialization on each call.None
factor_S22‘SparseFactorization | None’Cached Cholesky factor for S22 block. If provided, uses cholesky_inplace() to update with new values (same sparsity pattern). This avoids expensive symbolic analysis on repeated calls.None
factor_S‘SparseFactorization | None’Cached Cholesky factor for full augmented system S. Only used in full solve mode (when beta_fixed is None).None
pattern_template‘PatternTemplate | None’PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. Required when using factor_S22 with cross-theta caching.None

Returns:

TypeDescription
dictDictionary with keys:
dict- beta: Fixed effects coefficients (p,)
dict- u: Spherical random effects (q,)
dict- logdet_L: Log-determinant of (L’Z’WZL + I)
dict- fitted_z: Fitted working response Xbeta + ZL*u
dict- factor_S22: Cholesky factor for S22 (for caching)
dict- factor_S: Cholesky factor for full system S (for caching, full mode only)

theta_to_cholesky_block

theta_to_cholesky_block(theta: np.ndarray, dim: int) -> np.ndarray

Convert theta vector to lower-triangular Cholesky block.

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major). For 2x2: [L00, L10, L11], length = 3 For 3x3: [L00, L10, L11, L20, L21, L22], length = 6required
dimintDimension of the Cholesky block (2, 3, etc.).required

Returns:

TypeDescription
ndarrayLower triangular Cholesky block, shape (dim, dim).

Examples:

>>> theta = np.array([1.5, 0.3, 1.2])
>>> L = theta_to_cholesky_block(theta, 2)
>>> L
array([[1.5, 0. ],
       [0.3, 1.2]])

Notes:

update_lambda_from_template

update_lambda_from_template(template: LambdaTemplate, theta: np.ndarray) -> sp.csc_matrix

Update Lambda matrix from template using new theta values.

Efficiently updates only the data values without rebuilding structure.

Parameters:

NameTypeDescriptionDefault
templateLambdaTemplateLambdaTemplate created by build_lambda_template.required
thetandarrayNew theta values.required

Returns:

TypeDescription
csc_matrixUpdated sparse Lambda matrix in CSC format.

Examples:

>>> template = build_lambda_template([18], "intercept")
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda[0, 0]
1.5

Notes:

Modules

glm

GLM IRLS fitting algorithm.

This module implements the Iteratively Reweighted Least Squares (IRLS) algorithm for GLM fitting. The algorithm supports both JAX and NumPy backends:

All functions are designed to work with Family objects from

Functions:

NameDescription
compute_loglikCompute log-likelihood for a fitted GLM.
compute_null_devianceCompute null deviance for the intercept-only model.
fit_glm_irlsFit GLM using IRLS algorithm.
Classes
Functions
compute_loglik
compute_loglik(y: np.ndarray, mu: np.ndarray, deviance: float | np.ndarray, dispersion: float | np.ndarray, family: Family, n: int, xp: Any) -> np.ndarray | float

Compute log-likelihood for a fitted GLM.

Uses family-specific formulas for exact log-likelihood computation. Falls back to the deviance-based approximation for unknown families.

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values, shape (n,).required
mundarrayFitted values on response scale, shape (n,).required
deviancefloat | ndarrayResidual deviance (scalar).required
dispersionfloat | ndarrayDispersion parameter estimate (scalar).required
familyFamilyFamily object with name attribute.required
nintNumber of observations.required
xpAnyArray module (numpy or jax.numpy).required

Returns:

TypeDescription
ndarray | floatLog-likelihood as a scalar.
compute_null_deviance
compute_null_deviance(y: np.ndarray, family: Family, xp: Any) -> np.ndarray | float

Compute null deviance for the intercept-only model.

The null deviance measures how well the response is predicted by a model with only an intercept (the mean). It is used as a baseline for assessing model fit improvement.

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values, shape (n,).required
familyFamilyFamily object with deviance function.required
xpAnyArray module (numpy or jax.numpy).required

Returns:

TypeDescription
ndarray | floatNull deviance as a scalar.
fit_glm_irls
fit_glm_irls(y: np.ndarray, X: np.ndarray, family: Family, weights: np.ndarray | None = None, max_iter: int = 25, tol: float = 1e-08) -> dict

Fit GLM using IRLS algorithm.

Implements Iteratively Reweighted Least Squares for generalized linear models. The algorithm iterates until convergence (relative deviance change < tol) or max_iter is reached.

The core fitting loop is JIT-compiled for efficiency. All family-specific operations (link, variance, deviance) are pure JAX functions from the Family object.

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values (n,)required
XndarrayDesign matrix (n, p)required
familyFamilyFamily object with link, variance, and deviance functionsrequired
weightsndarray | NonePrior observation weights (n,) or None for equal weightsNone
max_iterintMaximum IRLS iterations (default: 25)25
tolfloatConvergence tolerance for relative deviance change (default: 1e-8)1e-08

Returns:

TypeDescription
dictDictionary with keys: - coef: Coefficient estimates (p,) - vcov: Variance-covariance matrix (p, p) - dispersion: Dispersion parameter estimate - fitted: Fitted values μ on response scale (n,) - linear_predictor: Linear predictor η on link scale (n,) - residuals: Response residuals y - μ (n,) - deviance: Residual deviance - null_deviance: Null model deviance - df_residual: Residual degrees of freedom - loglik: Log-likelihood - converged: True if converged, False otherwise - n_iter: Number of iterations performed - has_separation: Separation detection (binomial only, None otherwise) - irls_weights: Final IRLS weights (n,)

Notes:

glmer

PIRLS (Penalized Iteratively Reweighted Least Squares) algorithm for GLMMs.

This module implements the PIRLS algorithm for fitting generalized linear mixed models using Laplace approximation. The algorithm combines IRLS (from GLMs) with penalized least squares (from LMMs).

PIRLS Algorithm:

For fixed variance components theta: 1. Initialize eta from GLM fit 2. Until convergence: a. Compute working weights: w = 1 / (V(mu) * (deta/dmu)^2) b. Compute working response: z = eta + (y - mu) * deta/dmu c. Solve weighted penalized least squares: [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I] [u] = [L’Z’Wz] d. Update: eta = Xbeta + ZL*u e. Check convergence on |delta_eta|

GLMM Deviance (Laplace approximation):

-2 log L ~ sum(dev_resid) + log|L'Z'WZL + I| + ||u||^2

Reference:

lme4 external.cpp: - pwrssUpdate (lines 308-375) MixedModels.jl generalizedlinearmixedmodel.jl: - pirls! function (lines 600-655) - deviance function (lines 84-109)

Functions:

NameDescription
compute_glmm_loglikCompute GLMM log-likelihood via Laplace approximation.
compute_hessian_vcovCompute Hessian-based vcov for fixed effects.
compute_irls_quantitiesCompute IRLS working weights and working response.
deriv12Compute gradient and Hessian using central finite differences.
fit_glmm_pirlsFit GLMM using PIRLS with outer optimization over theta.
get_theta_lower_boundsGet lower bounds for theta parameters.
glmm_devianceCompute GLMM deviance via Laplace approximation.
glmm_deviance_at_fixed_theta_betaCompute GLMM deviance at fixed (theta, beta) by solving only for u.
glmm_deviance_objectiveCompute GLMM deviance for outer optimization.
init_theta_glmmInitialize theta for GLMM fitting.
pirls_sparsePenalized Iteratively Reweighted Least Squares using sparse operations.
solve_weighted_pls_sparseSolve weighted Penalized Least Squares for GLMM.
Classes
Functions
compute_glmm_loglik
compute_glmm_loglik(y: np.ndarray, mu: np.ndarray, family: Family, sqrL: float, logdet: float, prior_weights: np.ndarray | None = None) -> float

Compute GLMM log-likelihood via Laplace approximation.

Matches lme4’s logLik() for GLMMs: loglik = cond_loglik - 0.5 * (||u||^2 + logdet)

where cond_loglik is the full conditional log-likelihood including normalizing constants (log(y!), log(choose(m,y)), etc.).

For weighted binomial (grouped), uses the full binomial log-likelihood with log(choose(m, y_counts)) to match lme4’s aic() function.

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values (n,). For weighted binomial, these are proportions.required
mundarrayFitted values (n,).required
familyFamilyGLM family with loglik method.required
sqrLfloatSum of squared spherical random effects
logdetfloatLog-determinant logL’Z’WZL + I
prior_weightsndarray | NonePrior weights (n,). For binomial proportions, the number of trials per observation.None

Returns:

TypeDescription
floatLog-likelihood (scalar).
compute_hessian_vcov
compute_hessian_vcov(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, delta: float = 0.0001, eta_init: np.ndarray | None = None) -> tuple[np.ndarray, dict]

Compute Hessian-based vcov for fixed effects.

Matches lme4’s use.hessian=TRUE approach:

  1. Compute numerical Hessian of deviance w.r.t. [theta, beta]

  2. Invert the Hessian

  3. Extract beta-beta block

  4. Symmetrize

This uses lme4’s deriv12() approach (simple central differences with delta=1e-4) for exact parity with R.

Parameters:

NameTypeDescriptionDefault
thetandarrayFinal theta (variance component) estimatesrequired
betandarrayFinal beta (fixed effect) estimatesrequired
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparserequired
yndarrayResponse vector (n,)required
familyFamilyGLM family (binomial, poisson)required
n_groups_listlist[int]Number of groups per grouping factorrequired
re_structurestrRandom effects structure typerequired
metadatadict | NoneAdditional RE metadataNone
prior_weightsndarray | NonePrior observation weightsNone
pirls_max_iterintMaximum PIRLS iterations per deviance evaluation30
pirls_tolfloatPIRLS convergence tolerance1e-07
deltafloatFinite difference step size (default 1e-4, matches lme4)0.0001
eta_initndarray | NoneInitial linear predictor for PIRLS. If provided, used as starting point for all deviance evaluations. This provides numerical stability for models with large random effects.None

Returns:

TypeDescription
tuple[ndarray, dict]Tuple of: - vcov_beta: Variance-covariance matrix for beta, shape (p, p) - derivs: Dictionary with “gradient” and “Hessian” (for diagnostics)

Notes: If the Hessian is not positive definite, this function returns the pseudo-inverse of the beta-beta block, with a warning.

compute_irls_quantities
compute_irls_quantities(y: np.ndarray, eta: np.ndarray, family: Family) -> tuple[np.ndarray, np.ndarray]

Compute IRLS working weights and working response.

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values (n,)required
etandarrayLinear predictor (n,)required
familyFamilyGLM family objectrequired

Returns:

NameTypeDescription
working_weightsndarrayIRLS weights w = 1 / (V(mu) * (deta/dmu)^2)
working_responsendarrayPseudo-response z = eta + (y - mu) * deta/dmu
deriv12
deriv12(func: Callable[[np.ndarray], float], x: np.ndarray, delta: float = 0.0001, fx: float | None = None) -> dict

Compute gradient and Hessian using central finite differences.

This matches lme4’s deriv12() function exactly (lme4/R/deriv.R). Uses simple central differences with step size delta.

The Hessian is computed simultaneously with the gradient for efficiency:

Parameters:

NameTypeDescriptionDefault
funcCallable[[ndarray], float]Scalar function R^n -> R to differentiate.required
xndarrayPoint at which to compute derivatives, shape (n,).required
deltafloatStep size for finite differences (default 1e-4, matches lme4).0.0001
fxfloat | NoneOptional pre-computed f(x) to avoid redundant evaluation.None

Returns:

TypeDescription
dictDictionary with: - “gradient”: First derivative vector, shape (n,) - “Hessian”: Second derivative matrix, shape (n, n)

Note: This is O(h^2) accurate, matching lme4’s approach. For higher accuracy (O(h^8)), use compute_hessian_richardson from

fit_glmm_pirls
fit_glmm_pirls(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, theta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, max_outer_iter: int = 10000, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, two_stage: bool = True, lambda_template: 'LambdaTemplate | None' = None, nAGQ: int = 1, group_ids: np.ndarray | None = None) -> dict

Fit GLMM using PIRLS with outer optimization over theta.

Optimization strategy (matching lme4):

The two-stage approach matches lme4 exactly:

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparserequired
yndarrayResponse vector (n,)required
familyFamilyGLM family (binomial, poisson)required
n_groups_listlist[int]Number of groups per grouping factorrequired
re_structurestrRandom effects structure typerequired
metadatadict | NoneAdditional RE metadataNone
theta_initndarray | NoneInitial theta values (optional)None
prior_weightsndarray | NonePrior observation weightsNone
max_outer_iterintMaximum BOBYQA iterations for Stage 110000
pirls_max_iterintMaximum PIRLS iterations per theta evaluation30
pirls_tolfloatPIRLS convergence tolerance (1e-7 matches lme4)1e-07
verboseboolPrint optimization progressFalse
two_stageboolUse two-stage optimization matching lme4 (default True)True
lambda_template‘LambdaTemplate | None’Optional pre-built template for efficient Lambda updates. If provided, uses this template instead of building a new one. This is useful for bootstrap where the same template can be reused.None
nAGQintNumber of adaptive Gauss-Hermite quadrature points (default 1). nAGQ=1 uses the Laplace approximation. nAGQ > 1 uses AGQ for more accurate integration (scalar random intercept only).1
group_idsndarray | NoneGroup membership array (n,), integer-coded. Required when nAGQ > 1 for per-group quadrature.None

Returns:

TypeDescription
dictDictionary with:
dict- theta: Optimized theta parameters
dict- beta: Fixed effects coefficients
dict- u: Spherical random effects
dict- eta: Linear predictor
dict- mu: Fitted values
dict- deviance: Final GLMM deviance
dict- loglik: Laplace log-likelihood
dict- converged: Whether optimizer converged
dict- n_outer_iter: Number of outer iterations
dict- n_func_evals: Number of deviance evaluations
dict- pirls_converged: Whether final PIRLS converged
dict- pirls_n_iter: Final PIRLS iterations
get_theta_lower_bounds
get_theta_lower_bounds(n_theta: int, re_structure: str, metadata: dict | None = None) -> list[float]

Get lower bounds for theta parameters.

Diagonal elements of Cholesky factor must be non-negative. Off-diagonal elements are unbounded.

Parameters:

NameTypeDescriptionDefault
n_thetaintNumber of theta parametersrequired
re_structurestrRandom effects structure typerequired
metadatadict | NoneOptional metadata dict with ‘re_structures_list’ for crossed/nested/mixed structuresNone

Returns:

TypeDescription
list[float]List of lower bounds
glmm_deviance
glmm_deviance(y: np.ndarray, mu: np.ndarray, family: Family, logdet: float, sqrL: float, prior_weights: np.ndarray | None = None) -> float

Compute GLMM deviance via Laplace approximation.

deviance_GLMM = sum(dev_resid) + logdet + ||u||^2 deviance_GLMM = sum(dev_resid) + logdet + ||u||^2

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values (n,)required
mundarrayFitted values mu = g^{-1}(eta) (n,)required
familyFamilyGLM familyrequired
logdetfloatLog-determinant logL’Z’WZL + I
sqrLfloatSum of squared spherical random effects
prior_weightsndarray | NonePrior weights for observations (n,). For binomial with proportions, this is the number of trials.None

Returns:

TypeDescription
floatGLMM deviance (scalar)
glmm_deviance_at_fixed_theta_beta
glmm_deviance_at_fixed_theta_beta(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, eta_init: np.ndarray | None = None, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None, pattern_template: 'PatternTemplate | None' = None) -> float

Compute GLMM deviance at fixed (theta, beta) by solving only for u.

This is the Stage 2 objective function for joint (theta, beta) optimization. Unlike glmm_deviance_objective which optimizes beta via PIRLS, this function holds beta fixed and only solves for the random effects u.

This matches lme4’s approach in Stage 2 where beta is added to the offset before PIRLS runs, so PIRLS only solves for u.

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major)required
betandarrayFixed effects coefficients (p,) - held constantrequired
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparserequired
yndarrayResponse vector (n,)required
familyFamilyGLM family (binomial, poisson)required
n_groups_listlist[int]Number of groups per grouping factorrequired
re_structurestrRandom effects structure typerequired
metadatadict | NoneAdditional RE metadataNone
prior_weightsndarray | NonePrior observation weightsNone
pirls_max_iterintMaximum PIRLS iterations per theta evaluation30
pirls_tolfloatPIRLS convergence tolerance (1e-7 matches lme4)1e-07
verboseboolPrint verbose outputFalse
eta_initndarray | NoneInitial linear predictor for PIRLS. If provided, used as starting point instead of X @ beta. This matches lme4’s lp0 mechanism where the full linear predictor from Stage 1 (Xbeta + ZLambda*u) is saved and restored for Stage 2.None
lambda_template‘LambdaTemplate | None’Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call (~10-15% speedup).None
factor_cachedict | NoneOptional mutable dict for caching Cholesky factors across theta evaluations. Pass an empty dict {} to enable caching. Caches factor_S22 and factor_S to avoid expensive Cholesky symbolic analysis on each evaluation.None
pattern_template‘PatternTemplate | None’PatternTemplate for preserving sparsity patterns. Required for cross-theta factor caching. When provided, ensures S22 has consistent sparsity pattern even when theta hits boundaries (theta=0).None

Returns:

TypeDescription
floatGLMM deviance (scalar) at the exact (theta, beta) point
glmm_deviance_objective
glmm_deviance_objective(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, verbose: bool = False, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None, pattern_template: 'PatternTemplate | None' = None, pirls_result_cache: dict | None = None) -> float

Compute GLMM deviance for outer optimization.

This is the objective function for BOBYQA optimization over theta.

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major)required
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparserequired
yndarrayResponse vector (n,)required
familyFamilyGLM family (binomial, poisson)required
n_groups_listlist[int]Number of groups per grouping factorrequired
re_structurestrRandom effects structure typerequired
metadatadict | NoneAdditional RE metadataNone
prior_weightsndarray | NonePrior observation weightsNone
pirls_max_iterintMaximum PIRLS iterations per theta evaluation30
pirls_tolfloatPIRLS convergence tolerance (1e-7 matches lme4)1e-07
verboseboolPrint verbose outputFalse
lambda_template‘LambdaTemplate | None’Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call (~10-15% speedup).None
factor_cachedict | NoneOptional mutable dict for caching Cholesky factors across theta evaluations. Should contain keys ‘factor_S22’ and ‘factor_S’. Pass an empty dict {} to enable caching; factors will be stored and reused across calls, avoiding repeated symbolic analysis.None
pattern_template‘PatternTemplate | None’PatternTemplate for preserving sparsity patterns. Required for cross-theta factor caching. When provided, ensures S22 has consistent sparsity pattern even when theta hits boundaries (theta=0).None
pirls_result_cachedict | NoneOptional mutable dict for caching the last PIRLS result. When provided, stores {"theta": theta_copy, "result": pirls_result} on each call so the caller can reuse the result at the converged theta without a redundant PIRLS evaluation.None

Returns:

TypeDescription
floatGLMM deviance (scalar) - the Laplace approximation of -2*logLik
init_theta_glmm
init_theta_glmm(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> np.ndarray

Initialize theta for GLMM fitting.

Uses conservative defaults. Unlike LMMs, GLMMs work on the link scale so moment-based initialization is less useful.

Parameters:

NameTypeDescriptionDefault
n_groups_listlist[int]Number of groups per grouping factor.required
re_structurestrRandom effects structure type.required
metadatadict | NoneOptional metadata dict containing ‘random_names’ or ‘re_terms’ to determine correct number of theta parameters.None

Returns:

TypeDescription
ndarrayInitial theta values.
pirls_sparse
pirls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, family: Family, eta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, beta_fixed: np.ndarray | None = None, max_iter: int = 30, tol: float = 1e-07, verbose: bool = False, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dict

Penalized Iteratively Reweighted Least Squares using sparse operations.

For fixed variance parameters encoded in Lambda, iteratively solves for beta and u using weighted penalized least squares. This is the inner loop of GLMM optimization.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparserequired
Lambdacsc_matrixCholesky-like factor from theta (q, q), scipy sparserequired
yndarrayResponse vector (n,)required
familyFamilyGLM family (binomial, poisson)required
eta_initndarray | NoneInitial linear predictor (optional)None
prior_weightsndarray | NonePrior observation weights (n,). For binomial with proportions, this should be the number of trials.None
beta_fixedndarray | NoneIf provided, solve only for u with beta held fixed. This is used in Stage 2 optimization to evaluate deviance at arbitrary (theta, beta) points without re-optimizing beta.None
max_iterintMaximum PIRLS iterations30
tolfloatConvergence tolerance on relative deviance change (1e-7 matches lme4)1e-07
verboseboolPrint iteration infoFalse
factor_S22‘SparseFactorization | None’Cached Cholesky factor for S22 block (L’Z’WZL + I). If provided, reuses symbolic analysis across calls. Useful for caching across theta evaluations in the optimizer.None
factor_S‘SparseFactorization | None’Cached Cholesky factor for full augmented system. Only used when beta_fixed is None (full solve mode).None
pattern_template‘PatternTemplate | None’PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. This is required for cross-theta factor caching to work correctly when theta has boundary values (theta=0).None

Returns:

TypeDescription
dictDictionary with keys:
dict- beta: Fixed effects coefficients (p,)
dict- u: Spherical random effects (q,)
dict- eta: Final linear predictor (n,)
dict- mu: Final fitted values (n,)
dict- deviance: GLMM deviance at convergence
dict- converged: Whether PIRLS converged
dict- n_iter: Number of iterations
dict- logdet_L: Log-determinant
dict- factor_S22: Cholesky factor for S22 (for caching)
dict- factor_S: Cholesky factor for full system (for caching, full mode only)
solve_weighted_pls_sparse
solve_weighted_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, z: np.ndarray, weights: np.ndarray, beta_fixed: np.ndarray | None = None, ZL: sp.csc_matrix | None = None, ZL_dense: np.ndarray | None = None, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dict

Solve weighted Penalized Least Squares for GLMM.

[X’WX X’WZL ] [beta] [X’Wz ] [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I ] [u] = [L’Z’Wz]

using sparse Cholesky factorization.

If beta_fixed is provided, solves only for u with beta held constant: [L’Z’WZL + I] [u] = [L’Z’W(z - X*beta)]

This is critical for Stage 2 optimization where we need to evaluate deviance at arbitrary (theta, beta) points without PIRLS re-optimizing beta.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparse CSCrequired
Lambdacsc_matrixCholesky-like factor (q, q), scipy sparse CSCrequired
zndarrayWorking response (n,)required
weightsndarrayIRLS weights (n,)required
beta_fixedndarray | NoneIf provided, solve only for u with beta fixed to this value.None
ZLcsc_matrix | NonePre-computed Z @ Lambda (sparse). If provided, avoids redundant computation when called repeatedly with same Z and Lambda.None
ZL_densendarray | NonePre-computed ZL.toarray() (dense). If provided, avoids repeated sparse-to-dense materialization on each call.None
factor_S22‘SparseFactorization | None’Cached Cholesky factor for S22 block. If provided, uses cholesky_inplace() to update with new values (same sparsity pattern). This avoids expensive symbolic analysis on repeated calls.None
factor_S‘SparseFactorization | None’Cached Cholesky factor for full augmented system S. Only used in full solve mode (when beta_fixed is None).None
pattern_template‘PatternTemplate | None’PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. Required when using factor_S22 with cross-theta caching.None

Returns:

TypeDescription
dictDictionary with keys:
dict- beta: Fixed effects coefficients (p,)
dict- u: Spherical random effects (q,)
dict- logdet_L: Log-determinant of (L’Z’WZL + I)
dict- fitted_z: Fitted working response Xbeta + ZL*u
dict- factor_S22: Cholesky factor for S22 (for caching)
dict- factor_S: Cholesky factor for full system S (for caching, full mode only)

heuristics

Heuristics and numerical utilities for GLMM fitting.

This module contains initialization heuristics, bound computation, and finite-difference derivative routines used by the GLMM fitter in glmer.py.

Classes
Functions
compute_hessian_vcov
compute_hessian_vcov(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, delta: float = 0.0001, eta_init: np.ndarray | None = None) -> tuple[np.ndarray, dict]

Compute Hessian-based vcov for fixed effects.

Matches lme4’s use.hessian=TRUE approach:

  1. Compute numerical Hessian of deviance w.r.t. [theta, beta]

  2. Invert the Hessian

  3. Extract beta-beta block

  4. Symmetrize

This uses lme4’s deriv12() approach (simple central differences with delta=1e-4) for exact parity with R.

Parameters:

NameTypeDescriptionDefault
thetandarrayFinal theta (variance component) estimatesrequired
betandarrayFinal beta (fixed effect) estimatesrequired
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparserequired
yndarrayResponse vector (n,)required
familyFamilyGLM family (binomial, poisson)required
n_groups_listlist[int]Number of groups per grouping factorrequired
re_structurestrRandom effects structure typerequired
metadatadict | NoneAdditional RE metadataNone
prior_weightsndarray | NonePrior observation weightsNone
pirls_max_iterintMaximum PIRLS iterations per deviance evaluation30
pirls_tolfloatPIRLS convergence tolerance1e-07
deltafloatFinite difference step size (default 1e-4, matches lme4)0.0001
eta_initndarray | NoneInitial linear predictor for PIRLS. If provided, used as starting point for all deviance evaluations. This provides numerical stability for models with large random effects.None

Returns:

TypeDescription
tuple[ndarray, dict]Tuple of: - vcov_beta: Variance-covariance matrix for beta, shape (p, p) - derivs: Dictionary with “gradient” and “Hessian” (for diagnostics)

Notes: If the Hessian is not positive definite, this function returns the pseudo-inverse of the beta-beta block, with a warning.

deriv12
deriv12(func: Callable[[np.ndarray], float], x: np.ndarray, delta: float = 0.0001, fx: float | None = None) -> dict

Compute gradient and Hessian using central finite differences.

This matches lme4’s deriv12() function exactly (lme4/R/deriv.R). Uses simple central differences with step size delta.

The Hessian is computed simultaneously with the gradient for efficiency:

Parameters:

NameTypeDescriptionDefault
funcCallable[[ndarray], float]Scalar function R^n -> R to differentiate.required
xndarrayPoint at which to compute derivatives, shape (n,).required
deltafloatStep size for finite differences (default 1e-4, matches lme4).0.0001
fxfloat | NoneOptional pre-computed f(x) to avoid redundant evaluation.None

Returns:

TypeDescription
dictDictionary with: - “gradient”: First derivative vector, shape (n,) - “Hessian”: Second derivative matrix, shape (n, n)

Note: This is O(h^2) accurate, matching lme4’s approach. For higher accuracy (O(h^8)), use compute_hessian_richardson from

get_theta_lower_bounds
get_theta_lower_bounds(n_theta: int, re_structure: str, metadata: dict | None = None) -> list[float]

Get lower bounds for theta parameters.

Diagonal elements of Cholesky factor must be non-negative. Off-diagonal elements are unbounded.

Parameters:

NameTypeDescriptionDefault
n_thetaintNumber of theta parametersrequired
re_structurestrRandom effects structure typerequired
metadatadict | NoneOptional metadata dict with ‘re_structures_list’ for crossed/nested/mixed structuresNone

Returns:

TypeDescription
list[float]List of lower bounds
init_theta_glmm
init_theta_glmm(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> np.ndarray

Initialize theta for GLMM fitting.

Uses conservative defaults. Unlike LMMs, GLMMs work on the link scale so moment-based initialization is less useful.

Parameters:

NameTypeDescriptionDefault
n_groups_listlist[int]Number of groups per grouping factor.required
re_structurestrRandom effects structure type.required
metadatadict | NoneOptional metadata dict containing ‘random_names’ or ‘re_terms’ to determine correct number of theta parameters.None

Returns:

TypeDescription
ndarrayInitial theta values.

initialization

Moment-based initialization for LMM variance components.

Functions:

NameDescription
compute_moment_initCompute moment-based starting values for theta.
Functions
compute_moment_init
compute_moment_init(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, group_ids_list: list[np.ndarray], n_groups_list: list[int], re_structure: str, X_re: np.ndarray | None = None, metadata: dict | None = None) -> np.ndarray

Compute moment-based starting values for theta.

Dispatches to appropriate initializer based on re_structure.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix, shape (n, p).required
Zcsc_matrixRandom effects design matrix, shape (n, q), scipy sparse.required
yndarrayResponse vector, shape (n,).required
group_ids_listlist[ndarray]List of group ID arrays (one per grouping factor). For simple models: single array of shape (n,) with values 0 to n_groups-1 For crossed/nested: multiple arraysrequired
n_groups_listlist[int]Number of groups per grouping factor.required
re_structurestrRandom effects structure type. Options: “intercept”, “slope”, “diagonal”, “crossed”, “nested”, “mixed”required
X_rendarray | NoneRandom effects design matrix, shape (n, r). For slopes: typically (n, 2) with columns [ones, predictor] Not needed for intercept-only models.None
metadatadict | NoneOptional metadata dict with structure information. For crossed/nested/mixed, should contain ‘re_structures_list’ specifying each factor’s structure (e.g., [‘slope’, ‘intercept’]).None

Returns:

TypeDescription
ndarrayInitial theta values, shape varies by structure:
ndarray- Intercept: (1,) or (n_factors,) for crossed/nested
ndarray- Slope: (3,) for 2x2 Cholesky
ndarray- Diagonal: (k,) for k RE per group
ndarray- Crossed/nested with mixed: concatenation per factor

Examples:

>>> import numpy as np
>>> import scipy.sparse as sp
>>> # Simple random intercept
>>> X = np.random.randn(100, 2)
>>> Z = sp.random(100, 10, density=0.1, format='csc')
>>> y = np.random.randn(100)
>>> group_ids = [np.repeat(np.arange(10), 10)]
>>> theta = compute_moment_init(X, Z, y, group_ids, [10], "intercept")
>>> theta.shape
(1,)

Notes:

lambda_builder

Lambda matrix construction from theta parameters.

This module is a re-export facade. Implementation lives in:

All public names are re-exported here for backward compatibility.

Classes:

NameDescription
LambdaTemplateTemplate for efficient Lambda matrix updates during optimization.
PatternTemplateTemplate for preserving sparsity patterns across theta evaluations.

Functions:

NameDescription
build_lambda_sparseBuild sparse block-diagonal Lambda matrix from theta.
build_lambda_templateBuild a Lambda template for efficient repeated updates.
build_pattern_templateBuild pattern template for cross-theta Cholesky caching.
cholesky_block_to_thetaConvert lower-triangular Cholesky block to theta vector.
compute_s22_preserve_patternCompute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity.
compute_zl_preserve_patternCompute Z @ Lambda while preserving the sparsity pattern of ZL_pattern.
theta_to_cholesky_blockConvert theta vector to lower-triangular Cholesky block.
theta_to_variance_paramsConvert theta (Cholesky elements) to variance parameters.
update_lambda_from_templateUpdate Lambda matrix from template using new theta values.
Classes
LambdaTemplate
LambdaTemplate(template: sp.csc_matrix, theta_to_data_map: list[np.ndarray], re_structure: str, n_groups_list: list[int]) -> None

Template for efficient Lambda matrix updates during optimization.

Caches the sparsity pattern of Lambda so that repeated calls only update the data values, avoiding reconstruction of indices.

Attributes:

NameTypeDescription
templatecsc_matrixSparse CSC matrix with correct structure but placeholder values.
theta_to_data_maplist[ndarray]List mapping theta indices to positions in template.data. Each element is an int64 array of indices into template.data.
re_structurestrThe random effects structure type.
n_groups_listlist[int]Number of groups per factor.
Attributes
n_groups_list
n_groups_list: list[int]
re_structure
re_structure: str
template
template: sp.csc_matrix
theta_to_data_map
theta_to_data_map: list[np.ndarray]
PatternTemplate
PatternTemplate(ZL_pattern: sp.csc_matrix, S22_pattern: sp.csc_matrix, Z: sp.csc_matrix, lambda_template: 'LambdaTemplate') -> None

Template for preserving sparsity patterns across theta evaluations.

This is critical for cross-theta Cholesky caching. When theta has boundary values (theta=0), the resulting matrices would normally have fewer non-zeros. This template ensures the sparsity pattern remains constant by storing explicit zeros where needed.

Used to match lme4’s behavior where the sparsity pattern is fixed at initialization and never changes during optimization.

Attributes:

NameTypeDescription
ZL_patterncsc_matrixSparsity pattern of Z @ Lambda (computed with theta=ones).
S22_patterncsc_matrixSparsity pattern of ZL’ @ ZL + I (computed with theta=ones).
Zcsc_matrixReference to the original Z matrix for value computation.
lambda_template‘LambdaTemplate’Reference to the LambdaTemplate.
Attributes
S22_pattern
S22_pattern: sp.csc_matrix
Z
Z: sp.csc_matrix
ZL_pattern
ZL_pattern: sp.csc_matrix
lambda_template
lambda_template: 'LambdaTemplate'
Functions
build_lambda_sparse
build_lambda_sparse(theta: np.ndarray, n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> sp.csc_matrix

Build sparse block-diagonal Lambda matrix from theta.

Lambda is constructed based on the random effects structure:

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major).required
n_groups_listlist[int]Number of groups per grouping factor. For simple models: [n_groups] For nested/crossed: [n_groups_factor1, n_groups_factor2, ...]required
re_structurestrRandom effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed”required
metadatadict | NoneOptional metadata dict with structure information. May contain ‘re_structures_list’ for mixed models.None

Returns:

TypeDescription
csc_matrixSparse block-diagonal Lambda matrix, CSC format.

Examples:

>>> # Random intercept: (1|group)
>>> theta = np.array([1.5])
>>> Lambda = build_lambda_sparse(theta, [10], "intercept")
>>> Lambda.shape
(10, 10)
>>> # Correlated slopes: (1+x|group)
>>> theta = np.array([1.5, 0.3, 1.2])  # [L00, L10, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "slope")
>>> Lambda.shape
(36, 36)  # 18 groups x 2 RE per group
>>> # Uncorrelated slopes: (1+x||group)
>>> theta = np.array([1.5, 1.2])  # [L00, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "diagonal")
>>> Lambda.shape
(36, 36)

Notes:

build_lambda_template
build_lambda_template(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> LambdaTemplate

Build a Lambda template for efficient repeated updates.

Creates a sparse matrix template with the correct sparsity pattern, plus a mapping from theta indices to data array positions.

Parameters:

NameTypeDescriptionDefault
n_groups_listlist[int]Number of groups per grouping factor.required
re_structurestrRandom effects structure type.required
metadatadict | NoneOptional metadata dict with structure information.None

Returns:

TypeDescription
LambdaTemplateLambdaTemplate with template matrix and theta-to-data mapping.

Examples:

>>> # Create template for random intercept model
>>> template = build_lambda_template([18], "intercept")
>>> template.template.shape
(18, 18)
>>> # Update with new theta values
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda.data[0]
1.5

Notes:

build_pattern_template
build_pattern_template(Z: sp.csc_matrix, lambda_template: LambdaTemplate) -> PatternTemplate

Build pattern template for cross-theta Cholesky caching.

Computes the “maximal” sparsity patterns of ZL and S22 using theta=ones. These patterns are used to ensure consistent sparsity across all theta evaluations, even when theta components are zero (boundary conditions).

This matches lme4’s approach where the sparsity pattern is fixed at model initialization and never changes during optimization.

Parameters:

NameTypeDescriptionDefault
Zcsc_matrixRandom effects design matrix (n, q), scipy sparse CSC.required
lambda_templateLambdaTemplateLambdaTemplate with pattern information.required

Returns:

TypeDescription
PatternTemplatePatternTemplate with ZL and S22 patterns.

Examples:

>>> Z = sp.random(100, 50, density=0.1, format='csc')
>>> lambda_tpl = build_lambda_template([50], "intercept")
>>> pattern_tpl = build_pattern_template(Z, lambda_tpl)
>>> # Now use pattern_tpl for pattern-preserved computations
cholesky_block_to_theta
cholesky_block_to_theta(L: np.ndarray) -> np.ndarray

Convert lower-triangular Cholesky block to theta vector.

Parameters:

NameTypeDescriptionDefault
LndarrayLower triangular Cholesky block, shape (dim, dim).required

Returns:

TypeDescription
ndarrayTheta vector (lower triangle elements, column-major).

Examples:

>>> L = np.array([[1.5, 0.0], [0.3, 1.2]])
>>> theta = cholesky_block_to_theta(L)
>>> theta
array([1.5, 0.3, 1.2])

Notes:

compute_s22_preserve_pattern
compute_s22_preserve_pattern(ZL: sp.csc_matrix, W: sp.csc_matrix, S22_pattern: sp.csc_matrix) -> sp.csc_matrix

Compute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity.

Similar to compute_zl_preserve_pattern, this ensures S22 always has the same sparsity pattern regardless of theta values.

Parameters:

NameTypeDescriptionDefault
ZLcsc_matrixPattern-preserved ZL matrix from compute_zl_preserve_pattern.required
Wcsc_matrixDiagonal weight matrix.required
S22_patterncsc_matrixThe “maximal” S22 pattern from build_pattern_template.required

Returns:

TypeDescription
csc_matrixS22 with S22_pattern’s sparsity structure but current values.
compute_zl_preserve_pattern
compute_zl_preserve_pattern(Z: sp.csc_matrix, Lambda: sp.csc_matrix, ZL_pattern: sp.csc_matrix) -> sp.csc_matrix

Compute Z @ Lambda while preserving the sparsity pattern of ZL_pattern.

When theta has zeros (boundary), scipy’s Z @ Lambda prunes explicit zeros, changing the sparsity pattern. This function ensures the result always has the same pattern as ZL_pattern by adding explicit zeros where needed.

This is critical for CHOLMOD caching: the factorization reuses the symbolic analysis, which depends on having a consistent sparsity pattern.

Parameters:

NameTypeDescriptionDefault
Zcsc_matrixRandom effects design matrix (n, q).required
Lambdacsc_matrixCurrent Lambda matrix (may have zeros from boundary theta).required
ZL_patterncsc_matrixThe “maximal” ZL pattern from build_pattern_template.required

Returns:

TypeDescription
csc_matrixZL with ZL_pattern’s sparsity structure but current values.

Notes:

theta_to_cholesky_block
theta_to_cholesky_block(theta: np.ndarray, dim: int) -> np.ndarray

Convert theta vector to lower-triangular Cholesky block.

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major). For 2x2: [L00, L10, L11], length = 3 For 3x3: [L00, L10, L11, L20, L21, L22], length = 6required
dimintDimension of the Cholesky block (2, 3, etc.).required

Returns:

TypeDescription
ndarrayLower triangular Cholesky block, shape (dim, dim).

Examples:

>>> theta = np.array([1.5, 0.3, 1.2])
>>> L = theta_to_cholesky_block(theta, 2)
>>> L
array([[1.5, 0. ],
       [0.3, 1.2]])

Notes:

theta_to_variance_params
theta_to_variance_params(theta: np.ndarray, is_diagonal: bool = False) -> tuple[np.ndarray, np.ndarray]

Convert theta (Cholesky elements) to variance parameters.

Computes standard deviations and correlations from Cholesky factor.

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major). For intercept: [L00], length = 1 For slopes (2D): [L00, L10, L11], length = 3 For diagonal (2D): [L00, L11], length = 2required
is_diagonalboolIf True, theta contains only diagonal elements (

Returns:

TypeDescription
ndarrayTuple of (standard_deviations, correlations):
ndarray- standard_deviations: sqrt(diag(Sigma)), shape (n,)
tuple[ndarray, ndarray]- correlations: off-diagonal correlations, shape (n*(n-1)/2,)

Examples:

>>> # Correlated slopes
>>> theta = np.array([1.5, 0.3, 1.2])
>>> sds, rhos = theta_to_variance_params(theta)
>>> sds  # sqrt(diag(L @ L.T))
array([1.5       , 1.23693169])
>>> rhos  # correlation between intercept and slope
array([0.24253563])
>>> # Uncorrelated slopes
>>> theta = np.array([1.5, 1.2])
>>> sds, rhos = theta_to_variance_params(theta, is_diagonal=True)
>>> sds
array([1.5, 1.2])
>>> rhos  # empty for diagonal structure
array([])

Notes:

update_lambda_from_template
update_lambda_from_template(template: LambdaTemplate, theta: np.ndarray) -> sp.csc_matrix

Update Lambda matrix from template using new theta values.

Efficiently updates only the data values without rebuilding structure.

Parameters:

NameTypeDescriptionDefault
templateLambdaTemplateLambdaTemplate created by build_lambda_template.required
thetandarrayNew theta values.required

Returns:

TypeDescription
csc_matrixUpdated sparse Lambda matrix in CSC format.

Examples:

>>> template = build_lambda_template([18], "intercept")
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda[0, 0]
1.5

Notes:

lambda_sparse

Sparse Lambda matrix construction and theta-Cholesky conversions.

Lambda is the Cholesky-like factor of the random effects covariance matrix. It is block-diagonal, with blocks corresponding to grouping factors.

Theta Parameterization (lme4-compatible):

Theta contains elements of the lower triangular Cholesky factor L, where the random effects covariance is Sigma = sigma^2 * Lambda * Lambda’ and Lambda is built from theta.

Theta is on the relative scale (divided by sigma), following lme4’s convention.

Examples:

During optimization, theta is constrained:

Functions:

NameDescription
build_lambda_sparseBuild sparse block-diagonal Lambda matrix from theta.
cholesky_block_to_thetaConvert lower-triangular Cholesky block to theta vector.
theta_to_cholesky_blockConvert theta vector to lower-triangular Cholesky block.
theta_to_variance_paramsConvert theta (Cholesky elements) to variance parameters.
Functions
build_lambda_sparse
build_lambda_sparse(theta: np.ndarray, n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> sp.csc_matrix

Build sparse block-diagonal Lambda matrix from theta.

Lambda is constructed based on the random effects structure:

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major).required
n_groups_listlist[int]Number of groups per grouping factor. For simple models: [n_groups] For nested/crossed: [n_groups_factor1, n_groups_factor2, ...]required
re_structurestrRandom effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed”required
metadatadict | NoneOptional metadata dict with structure information. May contain ‘re_structures_list’ for mixed models.None

Returns:

TypeDescription
csc_matrixSparse block-diagonal Lambda matrix, CSC format.

Examples:

>>> # Random intercept: (1|group)
>>> theta = np.array([1.5])
>>> Lambda = build_lambda_sparse(theta, [10], "intercept")
>>> Lambda.shape
(10, 10)
>>> # Correlated slopes: (1+x|group)
>>> theta = np.array([1.5, 0.3, 1.2])  # [L00, L10, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "slope")
>>> Lambda.shape
(36, 36)  # 18 groups x 2 RE per group
>>> # Uncorrelated slopes: (1+x||group)
>>> theta = np.array([1.5, 1.2])  # [L00, L11]
>>> Lambda = build_lambda_sparse(theta, [18], "diagonal")
>>> Lambda.shape
(36, 36)

Notes:

cholesky_block_to_theta
cholesky_block_to_theta(L: np.ndarray) -> np.ndarray

Convert lower-triangular Cholesky block to theta vector.

Parameters:

NameTypeDescriptionDefault
LndarrayLower triangular Cholesky block, shape (dim, dim).required

Returns:

TypeDescription
ndarrayTheta vector (lower triangle elements, column-major).

Examples:

>>> L = np.array([[1.5, 0.0], [0.3, 1.2]])
>>> theta = cholesky_block_to_theta(L)
>>> theta
array([1.5, 0.3, 1.2])

Notes:

theta_to_cholesky_block
theta_to_cholesky_block(theta: np.ndarray, dim: int) -> np.ndarray

Convert theta vector to lower-triangular Cholesky block.

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major). For 2x2: [L00, L10, L11], length = 3 For 3x3: [L00, L10, L11, L20, L21, L22], length = 6required
dimintDimension of the Cholesky block (2, 3, etc.).required

Returns:

TypeDescription
ndarrayLower triangular Cholesky block, shape (dim, dim).

Examples:

>>> theta = np.array([1.5, 0.3, 1.2])
>>> L = theta_to_cholesky_block(theta, 2)
>>> L
array([[1.5, 0. ],
       [0.3, 1.2]])

Notes:

theta_to_variance_params
theta_to_variance_params(theta: np.ndarray, is_diagonal: bool = False) -> tuple[np.ndarray, np.ndarray]

Convert theta (Cholesky elements) to variance parameters.

Computes standard deviations and correlations from Cholesky factor.

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major). For intercept: [L00], length = 1 For slopes (2D): [L00, L10, L11], length = 3 For diagonal (2D): [L00, L11], length = 2required
is_diagonalboolIf True, theta contains only diagonal elements (

Returns:

TypeDescription
ndarrayTuple of (standard_deviations, correlations):
ndarray- standard_deviations: sqrt(diag(Sigma)), shape (n,)
tuple[ndarray, ndarray]- correlations: off-diagonal correlations, shape (n*(n-1)/2,)

Examples:

>>> # Correlated slopes
>>> theta = np.array([1.5, 0.3, 1.2])
>>> sds, rhos = theta_to_variance_params(theta)
>>> sds  # sqrt(diag(L @ L.T))
array([1.5       , 1.23693169])
>>> rhos  # correlation between intercept and slope
array([0.24253563])
>>> # Uncorrelated slopes
>>> theta = np.array([1.5, 1.2])
>>> sds, rhos = theta_to_variance_params(theta, is_diagonal=True)
>>> sds
array([1.5, 1.2])
>>> rhos  # empty for diagonal structure
array([])

Notes:

lambda_template

Lambda matrix templates and sparsity pattern preservation.

Provides LambdaTemplate and PatternTemplate for efficient repeated Lambda construction during optimization. Templates cache sparsity patterns so that only data values need updating when theta changes.

Pattern preservation functions ensure consistent sparsity across all theta evaluations, even at boundary conditions (theta=0), matching lme4’s behavior.

Classes:

NameDescription
LambdaTemplateTemplate for efficient Lambda matrix updates during optimization.
PatternTemplateTemplate for preserving sparsity patterns across theta evaluations.

Functions:

NameDescription
build_lambda_templateBuild a Lambda template for efficient repeated updates.
build_pattern_templateBuild pattern template for cross-theta Cholesky caching.
compute_s22_preserve_patternCompute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity.
compute_zl_preserve_patternCompute Z @ Lambda while preserving the sparsity pattern of ZL_pattern.
infer_diagonal_dimInfer dimension for diagonal structure from metadata.
infer_slope_dimInfer dimension for slope structure from metadata.
update_lambda_from_templateUpdate Lambda matrix from template using new theta values.
Classes
LambdaTemplate
LambdaTemplate(template: sp.csc_matrix, theta_to_data_map: list[np.ndarray], re_structure: str, n_groups_list: list[int]) -> None

Template for efficient Lambda matrix updates during optimization.

Caches the sparsity pattern of Lambda so that repeated calls only update the data values, avoiding reconstruction of indices.

Attributes:

NameTypeDescription
templatecsc_matrixSparse CSC matrix with correct structure but placeholder values.
theta_to_data_maplist[ndarray]List mapping theta indices to positions in template.data. Each element is an int64 array of indices into template.data.
re_structurestrThe random effects structure type.
n_groups_listlist[int]Number of groups per factor.
Attributes
n_groups_list
n_groups_list: list[int]
re_structure
re_structure: str
template
template: sp.csc_matrix
theta_to_data_map
theta_to_data_map: list[np.ndarray]
PatternTemplate
PatternTemplate(ZL_pattern: sp.csc_matrix, S22_pattern: sp.csc_matrix, Z: sp.csc_matrix, lambda_template: 'LambdaTemplate') -> None

Template for preserving sparsity patterns across theta evaluations.

This is critical for cross-theta Cholesky caching. When theta has boundary values (theta=0), the resulting matrices would normally have fewer non-zeros. This template ensures the sparsity pattern remains constant by storing explicit zeros where needed.

Used to match lme4’s behavior where the sparsity pattern is fixed at initialization and never changes during optimization.

Attributes:

NameTypeDescription
ZL_patterncsc_matrixSparsity pattern of Z @ Lambda (computed with theta=ones).
S22_patterncsc_matrixSparsity pattern of ZL’ @ ZL + I (computed with theta=ones).
Zcsc_matrixReference to the original Z matrix for value computation.
lambda_template‘LambdaTemplate’Reference to the LambdaTemplate.
Attributes
S22_pattern
S22_pattern: sp.csc_matrix
Z
Z: sp.csc_matrix
ZL_pattern
ZL_pattern: sp.csc_matrix
lambda_template
lambda_template: 'LambdaTemplate'
Functions
build_lambda_template
build_lambda_template(n_groups_list: list[int], re_structure: str, metadata: dict | None = None) -> LambdaTemplate

Build a Lambda template for efficient repeated updates.

Creates a sparse matrix template with the correct sparsity pattern, plus a mapping from theta indices to data array positions.

Parameters:

NameTypeDescriptionDefault
n_groups_listlist[int]Number of groups per grouping factor.required
re_structurestrRandom effects structure type.required
metadatadict | NoneOptional metadata dict with structure information.None

Returns:

TypeDescription
LambdaTemplateLambdaTemplate with template matrix and theta-to-data mapping.

Examples:

>>> # Create template for random intercept model
>>> template = build_lambda_template([18], "intercept")
>>> template.template.shape
(18, 18)
>>> # Update with new theta values
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda.data[0]
1.5

Notes:

build_pattern_template
build_pattern_template(Z: sp.csc_matrix, lambda_template: LambdaTemplate) -> PatternTemplate

Build pattern template for cross-theta Cholesky caching.

Computes the “maximal” sparsity patterns of ZL and S22 using theta=ones. These patterns are used to ensure consistent sparsity across all theta evaluations, even when theta components are zero (boundary conditions).

This matches lme4’s approach where the sparsity pattern is fixed at model initialization and never changes during optimization.

Parameters:

NameTypeDescriptionDefault
Zcsc_matrixRandom effects design matrix (n, q), scipy sparse CSC.required
lambda_templateLambdaTemplateLambdaTemplate with pattern information.required

Returns:

TypeDescription
PatternTemplatePatternTemplate with ZL and S22 patterns.

Examples:

>>> Z = sp.random(100, 50, density=0.1, format='csc')
>>> lambda_tpl = build_lambda_template([50], "intercept")
>>> pattern_tpl = build_pattern_template(Z, lambda_tpl)
>>> # Now use pattern_tpl for pattern-preserved computations
compute_s22_preserve_pattern
compute_s22_preserve_pattern(ZL: sp.csc_matrix, W: sp.csc_matrix, S22_pattern: sp.csc_matrix) -> sp.csc_matrix

Compute ZL’ @ W @ ZL + I while preserving S22_pattern’s sparsity.

Similar to compute_zl_preserve_pattern, this ensures S22 always has the same sparsity pattern regardless of theta values.

Parameters:

NameTypeDescriptionDefault
ZLcsc_matrixPattern-preserved ZL matrix from compute_zl_preserve_pattern.required
Wcsc_matrixDiagonal weight matrix.required
S22_patterncsc_matrixThe “maximal” S22 pattern from build_pattern_template.required

Returns:

TypeDescription
csc_matrixS22 with S22_pattern’s sparsity structure but current values.
compute_zl_preserve_pattern
compute_zl_preserve_pattern(Z: sp.csc_matrix, Lambda: sp.csc_matrix, ZL_pattern: sp.csc_matrix) -> sp.csc_matrix

Compute Z @ Lambda while preserving the sparsity pattern of ZL_pattern.

When theta has zeros (boundary), scipy’s Z @ Lambda prunes explicit zeros, changing the sparsity pattern. This function ensures the result always has the same pattern as ZL_pattern by adding explicit zeros where needed.

This is critical for CHOLMOD caching: the factorization reuses the symbolic analysis, which depends on having a consistent sparsity pattern.

Parameters:

NameTypeDescriptionDefault
Zcsc_matrixRandom effects design matrix (n, q).required
Lambdacsc_matrixCurrent Lambda matrix (may have zeros from boundary theta).required
ZL_patterncsc_matrixThe “maximal” ZL pattern from build_pattern_template.required

Returns:

TypeDescription
csc_matrixZL with ZL_pattern’s sparsity structure but current values.

Notes:

infer_diagonal_dim
infer_diagonal_dim(metadata: dict) -> int

Infer dimension for diagonal structure from metadata.

Parameters:

NameTypeDescriptionDefault
metadatadictMetadata dict possibly containing ‘re_terms’ or ‘random_names’.required

Returns:

TypeDescription
intDimension of the diagonal structure.
infer_slope_dim
infer_slope_dim(metadata: dict) -> int

Infer dimension for slope structure from metadata.

Parameters:

NameTypeDescriptionDefault
metadatadictMetadata dict possibly containing ‘re_terms’ or ‘random_names’.required

Returns:

TypeDescription
intDimension of the slope structure.
update_lambda_from_template
update_lambda_from_template(template: LambdaTemplate, theta: np.ndarray) -> sp.csc_matrix

Update Lambda matrix from template using new theta values.

Efficiently updates only the data values without rebuilding structure.

Parameters:

NameTypeDescriptionDefault
templateLambdaTemplateLambdaTemplate created by build_lambda_template.required
thetandarrayNew theta values.required

Returns:

TypeDescription
csc_matrixUpdated sparse Lambda matrix in CSC format.

Examples:

>>> template = build_lambda_template([18], "intercept")
>>> Lambda = update_lambda_from_template(template, np.array([1.5]))
>>> Lambda[0, 0]
1.5

Notes:

lmer

Core LMM operations: PLS solving and deviance computation.

This module implements the Penalized Least Squares (PLS) formulation for linear mixed models, matching lme4’s canonical approach.

minimize: ||y - Xβ - ZΛu||² + ||u||² minimize: ||y - Xβ - ZΛu||² + ||u||²

where:

Architecture:

lme4 factors only S22 = Λ’Z’ZΛ + I (not the full augmented system) lme4 factors only S22 = Λ’Z’ZΛ + I (not the full augmented system) and uses the Schur complement to solve for beta. See:

Classes:

NameDescription
PLSInvariantsPre-computed quantities that don’t change during theta optimization.

Functions:

NameDescription
apply_sqrt_weightsApply sqrt(weights) transformation to design matrices and response.
compute_pls_invariantsPre-compute quantities that are constant during optimization.
lmm_deviance_sparseCompute LMM deviance for optimization.
solve_pls_sparseSolve Penalized Least Squares system using Schur complement.
Classes
PLSInvariants
PLSInvariants(XtX: np.ndarray, Xty: np.ndarray, X_backend: np.ndarray, y_backend: np.ndarray) -> None

Pre-computed quantities that don’t change during theta optimization.

These are computed once when the model is set up and reused for every deviance evaluation during BOBYQA optimization.

Attributes:

NameTypeDescription
XtXndarrayGram matrix X’X, shape (p, p). Backend array (JAX or NumPy).
XtyndarrayCross-product X’y, shape (p,). Backend array.
X_backendndarrayFixed effects design matrix as backend array, shape (n, p).
y_backendndarrayResponse vector as backend array, shape (n,).
Attributes
X_backend
X_backend: np.ndarray
XtX
XtX: np.ndarray
Xty
Xty: np.ndarray
y_backend
y_backend: np.ndarray
Functions
apply_sqrt_weights
apply_sqrt_weights(X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, weights: np.ndarray | None) -> tuple[np.ndarray, sp.csc_matrix, np.ndarray, np.ndarray | None]

Apply sqrt(weights) transformation to design matrices and response.

Transforms the weighted LMM problem into an unweighted problem on transformed data. This is the standard approach from lme4 and MixedModels.jl.

minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||² minimize ||W^{1/2}(y - Xβ - ZΛu)||² + ||u||²

After transformation (X_w, Z_w, y_w = W^{1/2}X, W^{1/2}Z, W^{1/2}y): minimize ||y_w - X_w·β - Z_w·Λu||² + ||u||²

This is the standard unweighted PLS, so the same solver applies.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix, shape (n, p).required
Zcsc_matrixRandom effects design matrix, sparse csc, shape (n, q).required
yndarrayResponse vector, shape (n,).required
weightsndarray | NoneObservation weights, shape (n,), or None for unweighted.required

Returns:

TypeDescription
tuple[ndarray, csc_matrix, ndarray, ndarray | None]Tuple of (X_w, Z_w, y_w, sqrtwts) where: X_w: Transformed X, shape (n, p) Z_w: Transformed Z, sparse csc, shape (n, q) y_w: Transformed y, shape (n,) sqrtwts: sqrt(weights), shape (n,), or None if weights is None
compute_pls_invariants
compute_pls_invariants(X: np.ndarray, y: np.ndarray) -> PLSInvariants

Pre-compute quantities that are constant during optimization.

Call this once before the optimization loop to avoid redundant computation. X’X and X’y don’t depend on theta, so computing them once saves work.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix, shape (n, p).required
yndarrayResponse vector, shape (n,).required

Returns:

TypeDescription
PLSInvariantsPLSInvariants with pre-computed X’X, X’y, and backend arrays.

Examples:

>>> X = np.random.randn(100, 3)
>>> y = np.random.randn(100)
>>> inv = compute_pls_invariants(X, y)
>>> inv.XtX.shape
(3, 3)
lmm_deviance_sparse
lmm_deviance_sparse(theta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, n_groups_list: list[int], re_structure: str, method: str = 'REML', lambda_template: 'LambdaTemplate | None' = None, pls_invariants: PLSInvariants | None = None, metadata: dict | None = None, sqrtwts: np.ndarray | None = None) -> float

Compute LMM deviance for optimization.

The deviance is -2 * log-likelihood, profiled over β and σ².

-2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n -2 log L = log|Λ’Z’ZΛ+I| + nlog(2πPWRSS/n) + n

-2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p) -2 REML = log|Λ’Z’ZΛ+I| + log|X’V⁻¹X| + (n-p)log(2πPWRSS/(n-p)) + (n-p)

For weighted models, the deviance includes an additional adjustment: -log|W| = -2*sum(log(sqrtwts))

This follows lme4 and MixedModels.jl: X, Z, y should already be pre-transformed by sqrt(weights) using apply_sqrt_weights().

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (lower triangle, column-major).required
XndarrayFixed effects design matrix, shape (n, p). Should be pre-transformed by sqrt(weights) if weighted.required
Zcsc_matrixRandom effects design matrix, shape (n, q), scipy sparse. Should be pre-transformed by sqrt(weights) if weighted.required
yndarrayResponse vector, shape (n,). Should be pre-transformed by sqrt(weights) if weighted.required
n_groups_listlist[int]Number of groups per grouping factor.required
re_structurestrRandom effects structure type. Options: “intercept”, “slope”, “diagonal”, “nested”, “crossed”, “mixed”required
methodstrEstimation method, either “ML” or “REML”.‘REML’
lambda_template‘LambdaTemplate | None’Optional pre-built template for efficient Lambda updates. If provided, uses update_lambda_from_template instead of build_lambda_sparse. This avoids rebuilding the sparsity pattern on each call.None
pls_invariantsPLSInvariants | NoneOptional pre-computed invariants (X’X, X’y, JAX arrays). If provided, avoids redundant matrix computation in optimization loop. Use compute_pls_invariants() to create this before the loop.None
metadatadict | NoneOptional metadata dict with structure information. Required for crossed/nested/mixed structures with non-intercept factors. Should contain ‘re_structures_list’ specifying each factor’s structure.None
sqrtwtsndarray | NoneOptional sqrt(weights), shape (n,). If provided, adds the log-determinant adjustment -2*sum(log(sqrtwts)) to the deviance. This compensates for the Jacobian from the weight transformation.None

Returns:

TypeDescription
floatDeviance value (scalar).

Examples:

>>> import numpy as np
>>> import scipy.sparse as sp
>>> # Simple random intercept model
>>> theta = np.array([1.5])
>>> X = np.random.randn(100, 2)
>>> Z = sp.random(100, 10, density=0.1, format='csc')
>>> y = np.random.randn(100)
>>> dev = lmm_deviance_sparse(theta, X, Z, y, [10], "intercept", "REML")
>>> dev > 0
True

Notes:

solve_pls_sparse
solve_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, pls_invariants: PLSInvariants | None = None) -> dict

Solve Penalized Least Squares system using Schur complement.

Uses lme4’s algorithm: factor only S22 = Λ’Z’ZΛ + I, then use Schur complement to solve for beta. This avoids factoring the full augmented system.

Algorithm (matching lme4/src/predModule.cpp): 1. Factor S22 = Λ’Z’ZΛ + I via CHOLMOD 2. Compute RZX = L^{-1} * Λ’Z’X (forward solve) 3. Compute Schur = X’X - RZX’ * RZX (rank update) 4. Solve for beta via Schur complement 5. Back-solve for u

ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k ZΛ is kept sparse throughout. For InstEval-scale data (73k obs × 4k groups), this uses ~50MB instead of ~2.4GB per iteration.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix, shape (n, p).required
Zcsc_matrixRandom effects design matrix, shape (n, q), scipy sparse CSC.required
Lambdacsc_matrixCholesky-like factor, shape (q, q), scipy sparse CSC.required
yndarrayResponse vector, shape (n,).required
pls_invariantsPLSInvariants | NoneOptional pre-computed invariants (X’X, X’y, backend arrays). If provided, avoids redundant computation in optimization loop. Use compute_pls_invariants() to create this before the loop.None

Returns:

TypeDescription
dictDictionary with keys:
dict- beta: Fixed effects coefficients, shape (p,).
dict- u: Spherical random effects, shape (q,).
dict- logdet_L: Log-determinant of (Λ’Z’ZΛ + I).
dict- rss: Residual sum of squares
dict- logdet_RX: Log-determinant of Schur complement (for REML).

Examples:

>>> import numpy as np
>>> import scipy.sparse as sp
>>> n, p, q = 100, 3, 10
>>> X = np.random.randn(n, p)
>>> Z = sp.random(n, q, density=0.1, format='csc')
>>> Lambda = sp.eye(q, format='csc')
>>> y = np.random.randn(n)
>>> result = solve_pls_sparse(X, Z, Lambda, y)
>>> result['beta'].shape
(3,)
>>> result['u'].shape
(10,)

Notes:

optimize

BOBYQA optimizer via NLOPT for variance component optimization.

Functions:

NameDescription
optimize_thetaOptimize theta using BOBYQA via NLOPT.
Functions
optimize_theta
optimize_theta(objective: Callable, theta0: np.ndarray, lower: np.ndarray, upper: np.ndarray, rhobeg: float = 0.002, rhoend: float = 2e-07, maxfun: int = 10000, verbose: bool = False) -> dict

Optimize theta using BOBYQA via NLOPT.

BOBYQA is Powell’s derivative-free trust-region method for bound-constrained optimization. This implementation uses NLOPT’s LN_BOBYQA algorithm.

Parameters:

NameTypeDescriptionDefault
objectiveCallableFunction f(theta) -> scalar deviance to minimize.required
theta0ndarrayInitial parameter values (array-like).required
lowerndarrayLower bounds for parameters.required
upperndarrayUpper bounds for parameters.required
rhobegfloatInitial trust region radius (sets initial step size). Default 2e-3 (lme4’s default of 0.002).0.002
rhoendfloatFinal trust region radius (used to set tolerances). Default 2e-7 (lme4 default).2e-07
maxfunintMaximum function evaluations. Default 10000.10000
verboseboolPrint optimization progress.False

Returns:

TypeDescription
dictdict with keys: - theta: Optimized parameters (np.ndarray) - fun: Final objective value (float) - converged: Whether optimization succeeded (bool) - n_evals: Number of function evaluations (int) - message: Status message (str)

Examples:

>>> def rosenbrock(theta):
...     x, y = theta
...     return (1 - x)**2 + 100 * (y - x**2)**2
>>> result = optimize_theta(
...     rosenbrock,
...     theta0=[0.0, 0.0],
...     lower=[-2.0, -2.0],
...     upper=[2.0, 2.0]
... )
>>> result['converged']
True
>>> np.allclose(result['theta'], [1.0, 1.0], atol=1e-3)
True

pirls_sparse

Sparse PIRLS (Penalized Iteratively Reweighted Least Squares) core routines.

This module contains the inner-loop routines for fitting generalized linear mixed models via the Laplace approximation. These are called by the orchestrator in glmer.py and are not intended for direct use.

lme4 external.cpp (pwrssUpdate), MixedModels.jl pirls! lme4 external.cpp (pwrssUpdate), MixedModels.jl pirls!

Classes
Functions
compute_glmm_loglik
compute_glmm_loglik(y: np.ndarray, mu: np.ndarray, family: Family, sqrL: float, logdet: float, prior_weights: np.ndarray | None = None) -> float

Compute GLMM log-likelihood via Laplace approximation.

Matches lme4’s logLik() for GLMMs: loglik = cond_loglik - 0.5 * (||u||^2 + logdet)

where cond_loglik is the full conditional log-likelihood including normalizing constants (log(y!), log(choose(m,y)), etc.).

For weighted binomial (grouped), uses the full binomial log-likelihood with log(choose(m, y_counts)) to match lme4’s aic() function.

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values (n,). For weighted binomial, these are proportions.required
mundarrayFitted values (n,).required
familyFamilyGLM family with loglik method.required
sqrLfloatSum of squared spherical random effects
logdetfloatLog-determinant logL’Z’WZL + I
prior_weightsndarray | NonePrior weights (n,). For binomial proportions, the number of trials per observation.None

Returns:

TypeDescription
floatLog-likelihood (scalar).
compute_irls_quantities
compute_irls_quantities(y: np.ndarray, eta: np.ndarray, family: Family) -> tuple[np.ndarray, np.ndarray]

Compute IRLS working weights and working response.

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values (n,)required
etandarrayLinear predictor (n,)required
familyFamilyGLM family objectrequired

Returns:

NameTypeDescription
working_weightsndarrayIRLS weights w = 1 / (V(mu) * (deta/dmu)^2)
working_responsendarrayPseudo-response z = eta + (y - mu) * deta/dmu
glmm_deviance
glmm_deviance(y: np.ndarray, mu: np.ndarray, family: Family, logdet: float, sqrL: float, prior_weights: np.ndarray | None = None) -> float

Compute GLMM deviance via Laplace approximation.

deviance_GLMM = sum(dev_resid) + logdet + ||u||^2 deviance_GLMM = sum(dev_resid) + logdet + ||u||^2

Parameters:

NameTypeDescriptionDefault
yndarrayResponse values (n,)required
mundarrayFitted values mu = g^{-1}(eta) (n,)required
familyFamilyGLM familyrequired
logdetfloatLog-determinant logL’Z’WZL + I
sqrLfloatSum of squared spherical random effects
prior_weightsndarray | NonePrior weights for observations (n,). For binomial with proportions, this is the number of trials.None

Returns:

TypeDescription
floatGLMM deviance (scalar)
pirls_sparse
pirls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, y: np.ndarray, family: Family, eta_init: np.ndarray | None = None, prior_weights: np.ndarray | None = None, beta_fixed: np.ndarray | None = None, max_iter: int = 30, tol: float = 1e-07, verbose: bool = False, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dict

Penalized Iteratively Reweighted Least Squares using sparse operations.

For fixed variance parameters encoded in Lambda, iteratively solves for beta and u using weighted penalized least squares. This is the inner loop of GLMM optimization.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparserequired
Lambdacsc_matrixCholesky-like factor from theta (q, q), scipy sparserequired
yndarrayResponse vector (n,)required
familyFamilyGLM family (binomial, poisson)required
eta_initndarray | NoneInitial linear predictor (optional)None
prior_weightsndarray | NonePrior observation weights (n,). For binomial with proportions, this should be the number of trials.None
beta_fixedndarray | NoneIf provided, solve only for u with beta held fixed. This is used in Stage 2 optimization to evaluate deviance at arbitrary (theta, beta) points without re-optimizing beta.None
max_iterintMaximum PIRLS iterations30
tolfloatConvergence tolerance on relative deviance change (1e-7 matches lme4)1e-07
verboseboolPrint iteration infoFalse
factor_S22‘SparseFactorization | None’Cached Cholesky factor for S22 block (L’Z’WZL + I). If provided, reuses symbolic analysis across calls. Useful for caching across theta evaluations in the optimizer.None
factor_S‘SparseFactorization | None’Cached Cholesky factor for full augmented system. Only used when beta_fixed is None (full solve mode).None
pattern_template‘PatternTemplate | None’PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. This is required for cross-theta factor caching to work correctly when theta has boundary values (theta=0).None

Returns:

TypeDescription
dictDictionary with keys:
dict- beta: Fixed effects coefficients (p,)
dict- u: Spherical random effects (q,)
dict- eta: Final linear predictor (n,)
dict- mu: Final fitted values (n,)
dict- deviance: GLMM deviance at convergence
dict- converged: Whether PIRLS converged
dict- n_iter: Number of iterations
dict- logdet_L: Log-determinant
dict- factor_S22: Cholesky factor for S22 (for caching)
dict- factor_S: Cholesky factor for full system (for caching, full mode only)
solve_weighted_pls_sparse
solve_weighted_pls_sparse(X: np.ndarray, Z: sp.csc_matrix, Lambda: sp.csc_matrix, z: np.ndarray, weights: np.ndarray, beta_fixed: np.ndarray | None = None, ZL: sp.csc_matrix | None = None, ZL_dense: np.ndarray | None = None, factor_S22: 'SparseFactorization | None' = None, factor_S: 'SparseFactorization | None' = None, pattern_template: 'PatternTemplate | None' = None) -> dict

Solve weighted Penalized Least Squares for GLMM.

[X’WX X’WZL ] [beta] [X’Wz ] [X’WX X’WZL ] [beta] [X’Wz ] [L’Z’WX L’Z’WZL + I ] [u] = [L’Z’Wz]

using sparse Cholesky factorization.

If beta_fixed is provided, solves only for u with beta held constant: [L’Z’WZL + I] [u] = [L’Z’W(z - X*beta)]

This is critical for Stage 2 optimization where we need to evaluate deviance at arbitrary (theta, beta) points without PIRLS re-optimizing beta.

Parameters:

NameTypeDescriptionDefault
XndarrayFixed effects design matrix (n, p)required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparse CSCrequired
Lambdacsc_matrixCholesky-like factor (q, q), scipy sparse CSCrequired
zndarrayWorking response (n,)required
weightsndarrayIRLS weights (n,)required
beta_fixedndarray | NoneIf provided, solve only for u with beta fixed to this value.None
ZLcsc_matrix | NonePre-computed Z @ Lambda (sparse). If provided, avoids redundant computation when called repeatedly with same Z and Lambda.None
ZL_densendarray | NonePre-computed ZL.toarray() (dense). If provided, avoids repeated sparse-to-dense materialization on each call.None
factor_S22‘SparseFactorization | None’Cached Cholesky factor for S22 block. If provided, uses cholesky_inplace() to update with new values (same sparsity pattern). This avoids expensive symbolic analysis on repeated calls.None
factor_S‘SparseFactorization | None’Cached Cholesky factor for full augmented system S. Only used in full solve mode (when beta_fixed is None).None
pattern_template‘PatternTemplate | None’PatternTemplate for preserving sparsity patterns. When provided, ensures S22 has consistent pattern for cross-theta caching. Required when using factor_S22 with cross-theta caching.None

Returns:

TypeDescription
dictDictionary with keys:
dict- beta: Fixed effects coefficients (p,)
dict- u: Spherical random effects (q,)
dict- logdet_L: Log-determinant of (L’Z’WZL + I)
dict- fitted_z: Fitted working response Xbeta + ZL*u
dict- factor_S22: Cholesky factor for S22 (for caching)
dict- factor_S: Cholesky factor for full system S (for caching, full mode only)

quadrature

Adaptive Gauss-Hermite Quadrature (AGQ) for GLMM marginal likelihood.

This module implements nAGQ > 1 integration for scalar random intercept models. The AGQ deviance replaces the Laplace deviance in Stage 2 of GLMM fitting when higher accuracy is needed (few observations per group, extreme probabilities).

Restriction: nAGQ > 1 only works for models with a single scalar random intercept (e.g., (1|group)). This matches lme4’s restriction.

Math

For group i with conditional mode u_hat_i and conditional SD sigma_hat_i:

.. math::

u_{ij} = \hat{u}_i + \sqrt{2} \, \sigma_i \, z_j
\quad (\text{adaptive nodes})

h(u_{ij}) = \sum_{k \in i} \log f(y_k | \mu(\eta_{ij,k}))
            - \tfrac{1}{2} u_{ij}^2
\quad (\text{log integrand})

\log I_i = \log(\sigma_i)
          + \text{logsumexp}_j\!\bigl(
                \log(w_j / \sqrt{\pi}) + h(u_{ij}) + z_j^2
            \bigr)

\text{deviance} = -2 \sum_i \log I_i

Reference

Pinheiro & Bates (1995), Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model, JCGS 4(1):12-35.

lme4 source: glmFamily.cpp aic() and laplace().

Functions:

NameDescription
compute_agq_devianceCompute AGQ deviance for Stage 2 optimization.
Classes
Functions
compute_agq_deviance
compute_agq_deviance(theta: np.ndarray, beta: np.ndarray, X: np.ndarray, Z: sp.csc_matrix, y: np.ndarray, family: Family, n_groups_list: list[int], re_structure: str, group_ids: np.ndarray, nAGQ: int, metadata: dict | None = None, prior_weights: np.ndarray | None = None, pirls_max_iter: int = 30, pirls_tol: float = 1e-07, eta_init: np.ndarray | None = None, lambda_template: 'LambdaTemplate | None' = None, factor_cache: dict | None = None) -> float

Compute AGQ deviance for Stage 2 optimization.

This is the objective function for nAGQ > 1. It:

  1. Runs PIRLS with beta_fixed to get conditional modes u_hat

  2. Computes per-group conditional SDs from the S22 diagonal

  3. Evaluates the AGQ integral using Gauss-Hermite quadrature

  4. Returns -2 * sum_i(log_integral_i)

Only valid for scalar random intercept models (single grouping factor, intercept only).

Parameters:

NameTypeDescriptionDefault
thetandarrayCholesky factor elements (scalar for random intercept).required
betandarrayFixed effects coefficients (p,).required
XndarrayFixed effects design matrix (n, p).required
Zcsc_matrixRandom effects design matrix (n, q), scipy sparse.required
yndarrayResponse vector (n,).required
familyFamilyGLM family (binomial, poisson).required
n_groups_listlist[int]Number of groups per grouping factor.required
re_structurestrRandom effects structure type.required
group_idsndarrayGroup membership array (n,), integer-coded.required
nAGQintNumber of quadrature points (must be > 1).required
metadatadict | NoneAdditional RE metadata.None
prior_weightsndarray | NonePrior observation weights.None
pirls_max_iterintMaximum PIRLS iterations per evaluation.30
pirls_tolfloatPIRLS convergence tolerance.1e-07
eta_initndarray | NoneInitial linear predictor for PIRLS.None
lambda_template‘LambdaTemplate | None’Pre-built template for efficient Lambda updates.None
factor_cachedict | NoneMutable dict for caching Cholesky factors.None

Returns:

TypeDescription
floatAGQ deviance (scalar) = -2 * sum(log marginal likelihood).