Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Formula parsing, design matrix construction, and newdata evaluation.

Call chain:

model(formula, data) -> parse_formula() -> build_bundle_from_data() -> build_design_matrices()
model.predict(newdata=) -> evaluate_newdata() (applies learned encoding to new observations)

Classes:

NameDescription
DesignResultOutput of build_design_matrices(). Separates arrays from metadata.
FormulaErrorException raised for formula parsing errors.
TermResultResult of evaluating one formula term.

Functions:

NameDescription
build_design_matricesBuild X and y matrices from a parsed formula spec.
build_random_effects_from_specBuild random effects design matrix from FormulaSpec.
expand_double_vertsExpand
expand_nested_syntaxExpand nested / syntax into separate crossed random effects terms.
parse_formulaParse formula and detect categoricals from data.

Modules:

NameDescription
bundleData bundle construction from formula and DataFrame.
contrast_registryContrast function registry for explore formulas.
contrast_specsContrast specification resolution for design matrix coding.
designDesign matrix construction from FormulaSpec.
encodingCategorical variable encoding using Polars Enum.
evaluateTerm evaluation — AST nodes to design matrix columns.
evaluate_contrastContrast function evaluation for formula-syntax categorical encoding.
evaluate_newdataNewdata evaluation — apply learned encoding to new observations.
evaluate_transformsTransform evaluation — stateful, math, and polynomial transforms.
helpersShared AST utilities for formula operations.
parseR-style formula string parsing into FormulaSpec containers.
parserRecursive descent parser for statistical formula strings.
random_effectsRandom effects Z matrix construction from FormulaSpec.

Classes

DesignResult

Output of build_design_matrices(). Separates arrays from metadata.

Attributes:

NameTypeDescription
XNDArray[float64]Fixed effects design matrix of shape (n_obs, n_features).
X_labelstuple[str, ...]Column names for X matrix.
yNDArray[float64] | NoneResponse vector of shape (n_obs,), or None.
y_labelstr | NoneName of response variable, or None.

Attributes

X
X: NDArray[np.float64]
X_labels
X_labels: tuple[str, ...]
n_obs
n_obs: int

Number of observations.

y
y: NDArray[np.float64] | None = None
y_label
y_label: str | None = None

FormulaError

FormulaError(message: str, formula: str | None = None, position: int | None = None) -> None

Bases: ValueError

Exception raised for formula parsing errors.

Provides helpful error messages with pointer to error position.

Parameters:

NameTypeDescriptionDefault
messagestrError description.required
formulastr | NoneThe formula that caused the error.None
positionint | NoneCharacter position of the error (optional).None

Attributes:

NameTypeDescription
formula
position

Attributes

formula
formula = formula
position
position = position

TermResult

Result of evaluating one formula term.

Attributes:

NameTypeDescription
columnsNDArray[float64]Data array, shape (n_obs,) or (n_obs, k).
labelslist[str]Column names for the result.
state_updatesdictPartial state to merge back into accumulators. May contain keys: “factors”, “contrast_matrices”, “contrast_types”, “transform_state”, “transforms”.

Attributes

columns
columns: NDArray[np.float64]
labels
labels: list[str]
state_updates
state_updates: dict

Functions

build_design_matrices

build_design_matrices(spec: FormulaSpec, data: pl.DataFrame) -> tuple[DesignResult, FormulaSpec]

Build X and y matrices from a parsed formula spec.

Evaluates terms, learns encoding (contrasts, transforms), and returns both the design matrices AND an updated FormulaSpec with learned state.

The returned FormulaSpec has contrast_matrices, contrast_types, and transform_state populated (they may be empty in the input spec from parse_formula if this is the first build).

Parameters:

NameTypeDescriptionDefault
specFormulaSpecFormulaSpec from parse_formula().required
dataDataFramePolars DataFrame with training data.required

Returns:

TypeDescription
tuple[DesignResult, FormulaSpec](DesignResult, FormulaSpec) — matrices + updated spec with learned encoding.

build_random_effects_from_spec

build_random_effects_from_spec(spec: FormulaSpec, data: pl.DataFrame) -> RandomEffectsInfo | None

Build random effects design matrix from FormulaSpec.

Parameters:

NameTypeDescriptionDefault
specFormulaSpecParsed formula specification with re_terms.required
dataDataFrameTraining data (Polars DataFrame).required

Returns:

TypeDescription
RandomEffectsInfo | NoneRandomEffectsInfo with Z matrix and metadata, or None if no RE terms.

expand_double_verts

expand_double_verts(formula: str) -> tuple[str, dict]

Expand || syntax into separate uncorrelated random effects terms.

This matches lme4’s expandDoubleVerts() function. The || syntax creates independent (uncorrelated) random effects by expanding to separate terms.

Parameters:

NameTypeDescriptionDefault
formulastrR-style formula string potentially containing

Returns:

TypeDescription
tuple[str, dict]A tuple of: - Expanded formula string with

Examples:

>>> expand_double_verts("y ~ x + (Days || Subject)")
('y ~ x + (1 | Subject) + (0 + Days | Subject)', {...})
>>> expand_double_verts("y ~ x + (1 + x + y || group)")
('y ~ x + (1 | group) + (0 + x | group) + (0 + y | group)', {...})

Note: The transformation rules are:

expand_nested_syntax

expand_nested_syntax(formula: str) -> tuple[str, dict]

Expand nested / syntax into separate crossed random effects terms.

This matches lme4’s behavior where nested syntax (a/b) is syntactic sugar for separate terms: (1|a/b) expands to (1|a) + (1|a:b).

The : in the grouping factor creates an interaction grouping where each unique combination of levels becomes a separate group.

Parameters:

NameTypeDescriptionDefault
formulastrR-style formula string potentially containing / in RE terms.required

Returns:

TypeDescription
tuple[str, dict]A tuple of: - Expanded formula string with / replaced by separate terms - Metadata dict tracking which terms came from / expansion

Examples:

>>> expand_nested_syntax("y ~ x + (1|school/class)")
('y ~ x + (1|school) + (1|school:class)', {...})
>>> expand_nested_syntax("y ~ x + (1|a/b/c)")
('y ~ x + (1|a) + (1|a:b) + (1|a:b:c)', {...})
>>> expand_nested_syntax("y ~ x + (Days|Subject/Session)")
('y ~ x + (Days|Subject) + (Days|Subject:Session)', {...})

Note: The transformation rules are:

parse_formula

parse_formula(formula: str, data: pl.DataFrame, *, factors: dict[str, list[str]] | None = None, custom_contrasts: dict[str, NDArray] | None = None) -> FormulaSpec

Parse formula and detect categoricals from data.

This does parsing + categorical detection but NOT matrix construction. Reuses parser/ for tokenization and AST construction.

Parameters:

NameTypeDescriptionDefault
formulastrR-style formula string (e.g., “y ~ x + z”).required
dataDataFramePolars DataFrame to detect categoricals from.required
factorsdict[str, list[str]] | NoneOptional dict mapping column names to level orderings. If provided, these orderings are used for categorical encoding.None
custom_contrastsdict[str, NDArray] | NoneOptional dict of user-provided contrast matrices.None

Returns:

TypeDescription
FormulaSpecFormulaSpec with parsed terms and detected factor levels.

Modules

bundle

Data bundle construction from formula and DataFrame.

Orchestrates formula parsing, design matrix construction, missing value handling, weight validation, rank deficiency detection, and random effects metadata to produce a DataBundle. Extracted from model/core.py.

Functions:

NameDescription
build_bundle_from_dataBuild a DataBundle and learned FormulaSpec from a model spec and data.
filter_valid_rowsFilter a DataFrame to only valid (non-NA) rows using a boolean mask.

Classes

Functions

build_bundle_from_data
build_bundle_from_data(*, spec: ModelSpec, formula: str, data: pl.DataFrame, custom_contrasts: dict[str, np.ndarray] | None, weights_col: str | None, offset_col: str | None = None, missing: str) -> tuple[DataBundle, FormulaSpec]

Build a DataBundle and learned FormulaSpec from a model spec and data.

Handles the full pipeline: formula parsing, design matrix construction, missing value handling, weight validation, offset extraction, rank deficiency detection, family-specific response validation, and random effects metadata.

Parameters:

NameTypeDescriptionDefault
specModelSpecModel specification with parsed formula info.required
formulastrRaw formula string (e.g. ``"y ~ x + (1group)"``).
dataDataFrameInput data as a Polars DataFrame.required
custom_contrastsdict[str, ndarray] | NoneUser-specified contrast matrices, or None.required
weights_colstr | NoneName of the weights column in data, or None.required
offset_colstr | NoneName of the offset column in data, or None.None
missingstrHow to handle missing values ("drop" or "fail").required

Returns:

TypeDescription
DataBundleTuple of (DataBundle, FormulaSpec). The FormulaSpec is needed
FormulaSpecfor consistent newdata evaluation via evaluate_newdata().
filter_valid_rows
filter_valid_rows(data: pl.DataFrame | None, valid_mask: np.ndarray | None) -> pl.DataFrame | None

Filter a DataFrame to only valid (non-NA) rows using a boolean mask.

Returns the data unchanged if no filtering is needed (data is None, mask is None, or all rows are valid).

Parameters:

NameTypeDescriptionDefault
dataDataFrame | NonePolars DataFrame to filter, or None.required
valid_maskndarray | NoneBoolean array indicating valid rows, or None.required

Returns:

TypeDescription
DataFrame | NoneFiltered DataFrame, or None if data was None.

contrast_registry

Contrast function registry for explore formulas.

Centralizes the vocabulary of contrast function names, aliases, and parameter requirements. Used by the explore parser and contrast dispatch logic to ensure consistent naming.

Functions:

NameDescription
resolve_contrast_nameResolve a contrast function name to its canonical form.

Attributes:

NameTypeDescription
CONTRAST_ALIASESdict[str, str]
DEGREE_FUNCTIONSfrozenset[str]
MODEL_CONTRAST_FUNCTIONSfrozenset[str]
OMIT_FUNCTIONSfrozenset[str]
ORDER_DEPENDENTfrozenset[str]
REF_FUNCTIONSfrozenset[str]
VALID_CONTRAST_FUNCTIONSfrozenset[str]

Attributes

CONTRAST_ALIASES
CONTRAST_ALIASES: dict[str, str] = {'pairwise': 'pairwise', 'sequential': 'sequential', 'poly': 'poly', 'treatment': 'treatment', 'dummy': 'treatment', 'sum': 'sum', 'deviation': 'sum', 'helmert': 'helmert'}
DEGREE_FUNCTIONS
DEGREE_FUNCTIONS: frozenset[str] = frozenset({'poly'})
MODEL_CONTRAST_FUNCTIONS
MODEL_CONTRAST_FUNCTIONS: frozenset[str] = frozenset({k for k, v in (CONTRAST_ALIASES.items()) if v != 'pairwise'})
OMIT_FUNCTIONS
OMIT_FUNCTIONS: frozenset[str] = frozenset({'sum', 'deviation'})
ORDER_DEPENDENT
ORDER_DEPENDENT: frozenset[str] = frozenset({'sequential', 'poly', 'helmert'})
REF_FUNCTIONS
REF_FUNCTIONS: frozenset[str] = frozenset({'treatment', 'dummy'})
VALID_CONTRAST_FUNCTIONS
VALID_CONTRAST_FUNCTIONS: frozenset[str] = frozenset(CONTRAST_ALIASES.keys())

Functions

resolve_contrast_name
resolve_contrast_name(name: str) -> str

Resolve a contrast function name to its canonical form.

Parameters:

NameTypeDescriptionDefault
namestrContrast function name (may be an alias).required

Returns:

TypeDescription
strCanonical contrast name.

contrast_specs

Contrast specification resolution for design matrix coding.

Resolves user-facing contrast specs (strings, tuples, ndarrays) into concrete contrast matrices. This is pure validation + dispatch logic extracted from the model class.

resolve_contrast_specs: Validate and resolve contrast specifications resolve_contrast_specs: Validate and resolve contrast specifications

Functions:

NameDescription
resolve_contrast_specsResolve user-facing contrast specs into concrete contrast matrices.
validate_constructor_contrastsValidate the contrasts= kwarg passed to the model constructor.

Attributes

Functions

resolve_contrast_specs
resolve_contrast_specs(data: pl.DataFrame, contrasts: dict[str, object]) -> dict[str, NDArray[np.float64]]

Resolve user-facing contrast specs into concrete contrast matrices.

Takes the raw contrast specifications provided by the user (strings, tuples, ndarrays) and validates them against the data, returning a dict mapping column names to contrast matrices.

Parameters:

NameTypeDescriptionDefault
dataDataFramePolars DataFrame containing the model data. Used to validate column existence and extract factor levels.required
contrastsdict[str, object]Mapping of column names to contrast specifications. Each value can be: - A string: 'treatment', 'sum', 'helmert', 'poly', or 'sequential' - A tuple: ('treatment', 'B') for treatment coding with ‘B’ as reference, or ('sum', 'A') for sum coding omitting ‘A’ - An ndarray: Custom contrast matrix of shape (n_levels, n_levels - 1)required

Returns:

TypeDescription
dict[str, NDArray[float64]]Dict mapping column names to contrast matrices (each of shape
dict[str, NDArray[float64]](n_levels, n_levels - 1)).
validate_constructor_contrasts
validate_constructor_contrasts(contrasts: dict, data: pl.DataFrame | None) -> None

Validate the contrasts= kwarg passed to the model constructor.

Each value must be an ndarray of shape (n_levels, n_levels - 1) where n_levels is the number of unique values in the column.

Parameters:

NameTypeDescriptionDefault
contrastsdictDict mapping column names to ndarray contrast matrices.required
dataDataFrame | NoneThe model’s data (may be None for simulation-first).required

design

Design matrix construction from FormulaSpec.

Classes:

NameDescription
DesignResultOutput of build_design_matrices(). Separates arrays from metadata.

Functions:

NameDescription
build_design_matricesBuild X and y matrices from a parsed formula spec.

Attributes

Classes

DesignResult

Output of build_design_matrices(). Separates arrays from metadata.

Attributes:

NameTypeDescription
XNDArray[float64]Fixed effects design matrix of shape (n_obs, n_features).
X_labelstuple[str, ...]Column names for X matrix.
yNDArray[float64] | NoneResponse vector of shape (n_obs,), or None.
y_labelstr | NoneName of response variable, or None.
Attributes
X
X: NDArray[np.float64]
X_labels
X_labels: tuple[str, ...]
n_obs
n_obs: int

Number of observations.

y
y: NDArray[np.float64] | None = None
y_label
y_label: str | None = None

Functions

build_design_matrices
build_design_matrices(spec: FormulaSpec, data: pl.DataFrame) -> tuple[DesignResult, FormulaSpec]

Build X and y matrices from a parsed formula spec.

Evaluates terms, learns encoding (contrasts, transforms), and returns both the design matrices AND an updated FormulaSpec with learned state.

The returned FormulaSpec has contrast_matrices, contrast_types, and transform_state populated (they may be empty in the input spec from parse_formula if this is the first build).

Parameters:

NameTypeDescriptionDefault
specFormulaSpecFormulaSpec from parse_formula().required
dataDataFramePolars DataFrame with training data.required

Returns:

TypeDescription
tuple[DesignResult, FormulaSpec](DesignResult, FormulaSpec) — matrices + updated spec with learned encoding.

encoding

Categorical variable encoding using Polars Enum.

Functions:

NameDescription
detect_categoricalsDetect categorical variables from a formula AST.
detect_levelsInfer level ordering from a non-categorical series.
encode_categoricalEncode a categorical series using a contrast matrix.
ensure_enumConvert columns to Enum type with specified level ordering.
get_levelsGet the level ordering from an Enum or Categorical series.

Classes

Functions

detect_categoricals
detect_categoricals(ast: Binary | Call | Variable | object, data: pl.DataFrame) -> dict[str, list[str]]

Detect categorical variables from a formula AST.

Walks the AST to find:

  1. Explicit categorical markers: factor(x), T(x), S(x)

  2. String columns referenced in the formula

For explicit markers, extracts level information if provided. For implicit string columns, infers levels from data.

Parameters:

NameTypeDescriptionDefault
astBinary | Call | Variable | objectParsed formula AST (from parser).required
dataDataFrameDataFrame to check column types against.required

Returns:

TypeDescription
dict[str, list[str]]Dict mapping column names to ordered level lists.

Examples:

>>> from parser import Scanner, Parser
>>> import polars as pl
>>> tokens = Scanner('y ~ factor(group) + age').scan()
>>> ast = Parser(tokens).parse()
>>> df = pl.DataFrame({'y': [1, 2], 'group': ['A', 'B'], 'age': [30, 40]})
>>> detect_categoricals(ast, df)
{'group': ['A', 'B']}
detect_levels
detect_levels(series: pl.Series) -> list[str]

Infer level ordering from a non-categorical series.

For string columns that haven’t been converted to Enum yet, infers levels by getting unique values and sorting them.

For numeric columns used with factor(), formats integer values without decimal points (e.g., 6.0 becomes “6”).

Parameters:

NameTypeDescriptionDefault
seriesSeriesPolars series (any type).required

Returns:

TypeDescription
list[str]List of unique values, sorted alphabetically/numerically.

Examples:

>>> import polars as pl
>>> s = pl.Series('x', ['B', 'A', 'C', 'A'])
>>> detect_levels(s)
['A', 'B', 'C']
>>> s = pl.Series('x', [6.0, 4.0, 8.0, 4.0])
>>> detect_levels(s)
['4', '6', '8']
encode_categorical
encode_categorical(series: pl.Series, contrast: NDArray[np.float64]) -> NDArray[np.float64]

Encode a categorical series using a contrast matrix.

Takes a Polars series (must be Enum or Categorical type) and applies contrast encoding by indexing into the contrast matrix with the integer codes.

Parameters:

NameTypeDescriptionDefault
seriesSeriesPolars series with Enum or Categorical dtype.required
contrastNDArray[float64]Contrast matrix of shape (n_levels, n_columns). Row order must match the series’ category order.required

Returns:

TypeDescription
NDArray[float64]Encoded array of shape (n_obs, n_columns).

Examples:

>>> import polars as pl
>>> from coding import treatment_coding
>>> series = pl.Series('x', ['B', 'A', 'C']).cast(pl.Enum(['A', 'B', 'C']))
>>> contrast = treatment_coding(['A', 'B', 'C'])
>>> encode_categorical(series, contrast)
array([[1., 0.],
       [0., 0.],
       [0., 1.]])
ensure_enum
ensure_enum(data: pl.DataFrame, factors: dict[str, list[str]]) -> pl.DataFrame

Convert columns to Enum type with specified level ordering.

This function converts string/categorical columns to Polars Enum type, ensuring consistent level ordering. If a column is already an Enum with matching levels, it is left unchanged.

For numeric columns, applies the same formatting as detect_levels() to ensure consistency (e.g., 6.0 -> “6”).

Parameters:

NameTypeDescriptionDefault
dataDataFrameDataFrame to modify.required
factorsdict[str, list[str]]Dict mapping column names to ordered level lists. Level order determines reference category (first = reference).required

Returns:

TypeDescription
DataFrameDataFrame with specified columns converted to Enum type.

Examples:

>>> import polars as pl
>>> df = pl.DataFrame({'group': ['B', 'A', 'C', 'A', 'B']})
>>> df = ensure_enum(df, {'group': ['A', 'B', 'C']})
>>> df['group'].dtype
Enum(categories=['A', 'B', 'C'])
get_levels
get_levels(series: pl.Series) -> list[str]

Get the level ordering from an Enum or Categorical series.

Parameters:

NameTypeDescriptionDefault
seriesSeriesPolars series with Enum or Categorical dtype.required

Returns:

TypeDescription
list[str]List of category levels in their defined order.

Examples:

>>> import polars as pl
>>> s = pl.Series('x', ['B', 'A']).cast(pl.Enum(['A', 'B', 'C']))
>>> get_levels(s)
['A', 'B', 'C']

evaluate

Term evaluation — AST nodes to design matrix columns.

Classes:

NameDescription
TermResultResult of evaluating one formula term.

Functions:

NameDescription
evaluate_callEvaluate a function call term.
evaluate_categoricalEvaluate a categorical variable with contrast encoding.
evaluate_interactionEvaluate an interaction term (a:b).
evaluate_starEvaluate a * term (main effects + interaction): a * b = a + b + a:b.
evaluate_termEvaluate a single formula term against data.
evaluate_variableEvaluate a simple variable reference.

Attributes

Classes

TermResult

Result of evaluating one formula term.

Attributes:

NameTypeDescription
columnsNDArray[float64]Data array, shape (n_obs,) or (n_obs, k).
labelslist[str]Column names for the result.
state_updatesdictPartial state to merge back into accumulators. May contain keys: “factors”, “contrast_matrices”, “contrast_types”, “transform_state”, “transforms”.
Attributes
columns
columns: NDArray[np.float64]
labels
labels: list[str]
state_updates
state_updates: dict

Functions

evaluate_call
evaluate_call(call: Call, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> tuple[TermResult, bool]

Evaluate a function call term.

Handles: contrast functions, transform, center/norm/zscore/scale, log/log10/sqrt.

Parameters:

NameTypeDescriptionDefault
callCallCall AST node.required
dataDataFramePolars DataFrame.required
factorsdict[str, list[str]]Current factor levels mapping.required
contrast_matricesdict[str, NDArray]Current contrast matrices.required
transform_statedict[str, dict]Current transform state mapping.required
transformsdict[str, object]Current fitted transform instances.required
custom_contrastsdict[str, NDArray]User-provided contrast matrices.required
intercept_absorbedboolWhether intercept df has been absorbed.required

Returns:

TypeDescription
tuple[TermResult, bool](TermResult, updated_intercept_absorbed).
evaluate_categorical
evaluate_categorical(name: str, series: pl.Series, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool, *, spans_intercept: bool | None = None, contrast_type: str | None = None) -> TermResult

Evaluate a categorical variable with contrast encoding.

Parameters:

NameTypeDescriptionDefault
namestrVariable name.required
seriesSeriesPolars series with categorical data.required
factorsdict[str, list[str]]Current factor levels mapping.required
contrast_matricesdict[str, NDArray]Current contrast matrices.required
custom_contrastsdict[str, NDArray]User-provided contrast matrices.required
intercept_absorbedboolWhether intercept df has been absorbed.required
spans_interceptbool | NoneWhether this categorical should span the intercept. If None, determined automatically.None
contrast_typestr | NoneType of contrast to use. If None, defaults to “treatment”.None

Returns:

TypeDescription
TermResultTermResult with encoded data and state updates.
evaluate_interaction
evaluate_interaction(term: Binary, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool, *, is_toplevel: bool = True) -> tuple[TermResult, bool]

Evaluate an interaction term (a:b).

Parameters:

NameTypeDescriptionDefault
termBinaryBinary AST node with COLON operator.required
dataDataFramePolars DataFrame.required
factorsdict[str, list[str]]Current factor levels mapping.required
contrast_matricesdict[str, NDArray]Current contrast matrices.required
transform_statedict[str, dict]Current transform state.required
transformsdict[str, object]Current fitted transforms.required
custom_contrastsdict[str, NDArray]User-provided contrast matrices.required
intercept_absorbedboolWhether intercept df has been absorbed.required
is_toplevelboolWhether this is a top-level interaction term.True

Returns:

TypeDescription
tuple[TermResult, bool](TermResult, updated_intercept_absorbed).
evaluate_star
evaluate_star(term: Binary, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> tuple[TermResult, bool]

Evaluate a * term (main effects + interaction): a * b = a + b + a:b.

Parameters:

NameTypeDescriptionDefault
termBinaryBinary AST node with STAR operator.required
dataDataFramePolars DataFrame.required
factorsdict[str, list[str]]Current factor levels mapping.required
contrast_matricesdict[str, NDArray]Current contrast matrices.required
transform_statedict[str, dict]Current transform state.required
transformsdict[str, object]Current fitted transforms.required
custom_contrastsdict[str, NDArray]User-provided contrast matrices.required
intercept_absorbedboolWhether intercept df has been absorbed.required

Returns:

TypeDescription
tuple[TermResult, bool](TermResult, updated_intercept_absorbed).
evaluate_term
evaluate_term(term: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> tuple[TermResult, bool]

Evaluate a single formula term against data.

Dispatches to evaluate_variable, evaluate_call, evaluate_interaction, etc. based on AST node type.

Parameters:

NameTypeDescriptionDefault
termobjectAST node representing the term.required
dataDataFramePolars DataFrame with training data.required
factorsdict[str, list[str]]Current factor levels mapping.required
contrast_matricesdict[str, NDArray]Current contrast matrices mapping.required
transform_statedict[str, dict]Current transform state mapping.required
transformsdict[str, object]Current fitted transform instances mapping.required
custom_contrastsdict[str, NDArray]User-provided contrast matrices.required
intercept_absorbedboolWhether intercept df has been absorbed.required

Returns:

TypeDescription
tuple[TermResult, bool](TermResult, updated_intercept_absorbed).
evaluate_variable
evaluate_variable(name: str, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> TermResult

Evaluate a simple variable reference.

Parameters:

NameTypeDescriptionDefault
namestrColumn name.required
dataDataFramePolars DataFrame.required
factorsdict[str, list[str]]Current factor levels mapping.required
contrast_matricesdict[str, NDArray]Current contrast matrices.required
custom_contrastsdict[str, NDArray]User-provided contrast matrices.required
intercept_absorbedboolWhether intercept df has been absorbed.required

Returns:

TypeDescription
TermResultTermResult with evaluated data.

evaluate_contrast

Contrast function evaluation for formula-syntax categorical encoding.

Handles formula expressions like treatment(x, ref=B), sum(x, omit=A), helmert(x, [low, med, high]), poly(x, [lo, hi], degree=2).

Each contrast function maps to a builder from design.coding and a label generator. The function name in the formula IS the encoding scheme.

Functions:

NameDescription
evaluate_contrast_callEvaluate a contrast encoding function call.

Attributes

Classes

Functions

evaluate_contrast_call
evaluate_contrast_call(call: Call, func_name: str, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool, *, spans_intercept: bool | None = None) -> tuple['TermResult', bool]

Evaluate a contrast encoding function call.

Dispatches to the appropriate contrast matrix builder based on the function name (treatment, sum, helmert, sequential, poly).

Parameters:

NameTypeDescriptionDefault
callCallCall AST node (e.g., treatment(x, ref=B)).required
func_namestrCanonical function name (already resolved from aliases).required
dataDataFramePolars DataFrame with training data.required
factorsdict[str, list[str]]Current factor levels mapping.required
contrast_matricesdict[str, NDArray]Current contrast matrices mapping.required
custom_contrastsdict[str, NDArray]User-provided contrast matrices (ndarray overrides).required
intercept_absorbedboolWhether intercept df has been absorbed.required
spans_interceptbool | NoneWhether this categorical should span the intercept. If None, determined automatically from intercept_absorbed.None

Returns:

TypeDescription
tuple’TermResult’, [bool](TermResult, updated_intercept_absorbed).

evaluate_newdata

Newdata evaluation — apply learned encoding to new observations.

Functions:

NameDescription
evaluate_newdataApply learned encoding from FormulaSpec to new data.

Attributes

Classes

Functions

evaluate_newdata
evaluate_newdata(spec: FormulaSpec, data: pl.DataFrame, *, on_unseen_level: str = 'error') -> NDArray[np.float64]

Apply learned encoding from FormulaSpec to new data.

Pure function: reads factor levels, contrast matrices, and transform state from spec. No mutation.

Parameters:

NameTypeDescriptionDefault
specFormulaSpecFormulaSpec with learned encoding from build_design_matrices().required
dataDataFrameNew data as Polars DataFrame.required
on_unseen_levelstrHow to handle unseen categorical levels. - “error”: Raise ValueError (default) - “warn”: Warn and encode as zeros - “ignore”: Silently encode as zeros‘error’

Returns:

TypeDescription
NDArray[float64]X matrix for new observations, shape (n_new, n_features).
NDArray[float64]Column order matches the original build_design_matrices() output.

evaluate_transforms

Transform evaluation — stateful, math, and polynomial transforms.

Extracted from evaluate.py to keep file sizes manageable and provide a clean home for nested transform evaluation logic.

Functions:

NameDescription
evaluate_math_transformEvaluate a math transform like log(), sqrt().
evaluate_stateful_transformEvaluate a stateful transform like center(), scale(), rank().
resolve_transform_argResolve a transform argument to raw data, handling nested calls.

Classes

Functions

evaluate_math_transform
evaluate_math_transform(func_name: str, arg: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray]) -> TermResult

Evaluate a math transform like log(), sqrt().

Supports nested calls: log(rank(x)) evaluates rank(x) first, then applies log to the result.

Parameters:

NameTypeDescriptionDefault
func_namestrTransform name.required
argobjectAST node for the argument (Variable or nested Call).required
dataDataFramePolars DataFrame.required
factorsdict[str, list[str]]Current factor levels mapping.required
contrast_matricesdict[str, NDArray]Current contrast matrices.required
transform_statedict[str, dict]Current transform state mapping.required
transformsdict[str, object]Current fitted transform instances.required
custom_contrastsdict[str, NDArray]User-provided contrast matrices.required

Returns:

TypeDescription
TermResultTermResult with transformed data.
evaluate_stateful_transform
evaluate_stateful_transform(func_name: str, arg: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray]) -> TermResult

Evaluate a stateful transform like center(), scale(), rank().

Supports nested calls: zscore(rank(x)) evaluates rank(x) first, then applies zscore to the result.

Parameters:

NameTypeDescriptionDefault
func_namestrTransform name.required
argobjectAST node for the argument (Variable or nested Call).required
dataDataFramePolars DataFrame.required
factorsdict[str, list[str]]Current factor levels mapping.required
contrast_matricesdict[str, NDArray]Current contrast matrices.required
transform_statedict[str, dict]Current transform state mapping.required
transformsdict[str, object]Current fitted transform instances.required
custom_contrastsdict[str, NDArray]User-provided contrast matrices.required

Returns:

TypeDescription
TermResultTermResult with transformed data and state updates.
resolve_transform_arg
resolve_transform_arg(arg: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray]) -> tuple[NDArray[np.float64], str, dict]

Resolve a transform argument to raw data, handling nested calls.

If the argument is a simple variable, fetches it from the DataFrame. If the argument is a nested Call (e.g. rank(x) inside zscore(rank(x))), recursively evaluates the inner call first.

Parameters:

NameTypeDescriptionDefault
argobjectAST node — Variable, QuotedName, or Call.required
dataDataFramePolars DataFrame.required
factorsdict[str, list[str]]Current factor levels mapping.required
contrast_matricesdict[str, NDArray]Current contrast matrices.required
transform_statedict[str, dict]Current transform state mapping.required
transformsdict[str, object]Current fitted transform instances.required
custom_contrastsdict[str, NDArray]User-provided contrast matrices.required

Returns:

TypeDescription
NDArray[float64](raw_data, label, inner_state_updates) where raw_data is a 1-D float64
strarray, label is the display name (e.g. “x” or “rank(x)”), and
dictinner_state_updates is any state produced by inner transforms.

helpers

Shared AST utilities for formula operations.

Functions:

NameDescription
contains_pipeCheck if an AST node contains a PIPE operator (random effect).
extract_nameExtract variable name or literal value from AST node.
variable_not_found_errorCreate informative error for missing variable.

Classes

Functions

contains_pipe
contains_pipe(node: object) -> bool

Check if an AST node contains a PIPE operator (random effect).

Parameters:

NameTypeDescriptionDefault
nodeobjectAST node to check.required

Returns:

TypeDescription
boolTrue if node contains a PIPE operator.
extract_name
extract_name(node: object) -> str | None

Extract variable name or literal value from AST node.

Handles interaction terms (a:b) by recursively extracting names from both sides and joining with ‘:’.

Parameters:

NameTypeDescriptionDefault
nodeobjectAST node to extract name from.required

Returns:

TypeDescription
str | NoneVariable name string, or None if not extractable.
variable_not_found_error
variable_not_found_error(name: str, data_columns: list[str]) -> ValueError

Create informative error for missing variable.

Parameters:

NameTypeDescriptionDefault
namestrVariable name that was not found.required
data_columnslist[str]List of available column names in the data.required

Returns:

TypeDescription
ValueErrorValueError with helpful message including available columns
ValueErrorand “did you mean?” suggestions.

parse

R-style formula string parsing into FormulaSpec containers.

Classes:

NameDescription
FormulaErrorException raised for formula parsing errors.
FormulaStructureData-free formula structure extracted from an AST.

Functions:

NameDescription
expand_double_vertsExpand
expand_nested_syntaxExpand nested / syntax into separate crossed random effects terms.
extract_formula_structureExtract formula structure from a formula string without data.
parse_formulaParse formula and detect categoricals from data.

Attributes

Classes

FormulaError
FormulaError(message: str, formula: str | None = None, position: int | None = None) -> None

Bases: ValueError

Exception raised for formula parsing errors.

Provides helpful error messages with pointer to error position.

Parameters:

NameTypeDescriptionDefault
messagestrError description.required
formulastr | NoneThe formula that caused the error.None
positionint | NoneCharacter position of the error (optional).None

Attributes:

NameTypeDescription
formula
position
Attributes
formula
formula = formula
position
position = position
FormulaStructure

Data-free formula structure extracted from an AST.

Contains the same structural information as a full parse_formula() call but without requiring data for categorical detection. Used by build_model_spec_from_formula() to replace regex-based extraction.

Attributes:

NameTypeDescription
response_varstr | NoneResponse variable name, or None for RHS-only formulas.
response_transformtuple[str, ...] | NoneTuple of LHS transforms (innermost-first), or None if no transforms.
fixed_term_namestuple[str, ...]Human-readable fixed-effect term names (e.g. ["Intercept", "x", "group"]).
has_interceptboolWhether the formula includes an intercept.
has_random_effectsboolWhether the formula contains ``
random_terms_rawtuple[str, ...]Raw string representations of RE terms (e.g. ``["(1
Attributes
fixed_term_names
fixed_term_names: tuple[str, ...]
has_intercept
has_intercept: bool
has_random_effects
has_random_effects: bool
random_terms_raw
random_terms_raw: tuple[str, ...]
response_transform
response_transform: tuple[str, ...] | None
response_var
response_var: str | None

Functions

expand_double_verts
expand_double_verts(formula: str) -> tuple[str, dict]

Expand || syntax into separate uncorrelated random effects terms.

This matches lme4’s expandDoubleVerts() function. The || syntax creates independent (uncorrelated) random effects by expanding to separate terms.

Parameters:

NameTypeDescriptionDefault
formulastrR-style formula string potentially containing

Returns:

TypeDescription
tuple[str, dict]A tuple of: - Expanded formula string with

Examples:

>>> expand_double_verts("y ~ x + (Days || Subject)")
('y ~ x + (1 | Subject) + (0 + Days | Subject)', {...})
>>> expand_double_verts("y ~ x + (1 + x + y || group)")
('y ~ x + (1 | group) + (0 + x | group) + (0 + y | group)', {...})

Note: The transformation rules are:

expand_nested_syntax
expand_nested_syntax(formula: str) -> tuple[str, dict]

Expand nested / syntax into separate crossed random effects terms.

This matches lme4’s behavior where nested syntax (a/b) is syntactic sugar for separate terms: (1|a/b) expands to (1|a) + (1|a:b).

The : in the grouping factor creates an interaction grouping where each unique combination of levels becomes a separate group.

Parameters:

NameTypeDescriptionDefault
formulastrR-style formula string potentially containing / in RE terms.required

Returns:

TypeDescription
tuple[str, dict]A tuple of: - Expanded formula string with / replaced by separate terms - Metadata dict tracking which terms came from / expansion

Examples:

>>> expand_nested_syntax("y ~ x + (1|school/class)")
('y ~ x + (1|school) + (1|school:class)', {...})
>>> expand_nested_syntax("y ~ x + (1|a/b/c)")
('y ~ x + (1|a) + (1|a:b) + (1|a:b:c)', {...})
>>> expand_nested_syntax("y ~ x + (Days|Subject/Session)")
('y ~ x + (Days|Subject) + (Days|Subject:Session)', {...})

Note: The transformation rules are:

extract_formula_structure
extract_formula_structure(formula: str) -> FormulaStructure

Extract formula structure from a formula string without data.

Parses the formula into an AST and walks it to extract response variable, fixed-effect term names, intercept presence, and random-effect term strings. Does not require data (no categorical detection).

Parameters:

NameTypeDescriptionDefault
formulastrR-style formula string (e.g. ``"y ~ x + (1group)"``).

Returns:

TypeDescription
FormulaStructureFormulaStructure with extracted information.
parse_formula
parse_formula(formula: str, data: pl.DataFrame, *, factors: dict[str, list[str]] | None = None, custom_contrasts: dict[str, NDArray] | None = None) -> FormulaSpec

Parse formula and detect categoricals from data.

This does parsing + categorical detection but NOT matrix construction. Reuses parser/ for tokenization and AST construction.

Parameters:

NameTypeDescriptionDefault
formulastrR-style formula string (e.g., “y ~ x + z”).required
dataDataFramePolars DataFrame to detect categoricals from.required
factorsdict[str, list[str]] | NoneOptional dict mapping column names to level orderings. If provided, these orderings are used for categorical encoding.None
custom_contrastsdict[str, NDArray] | NoneOptional dict of user-provided contrast matrices.None

Returns:

TypeDescription
FormulaSpecFormulaSpec with parsed terms and detected factor levels.

parser

Recursive descent parser for statistical formula strings.

Modules:

NameDescription
exprAST expression node types for formula parsing.
parserRecursive descent parser for formula strings.
scannerFormula string scanner/tokenizer.
tokenToken class for formula parsing.

Classes:

NameDescription
AssignExpression for assignments (e.g., x=value in function calls).
BinaryExpression for binary operations (e.g., x + y, x ~ y).
CallExpression for function calls (e.g., factor(x), center(y)).
GroupingExpression for parenthesized groups.
ListExprExpression for bracket list literals (e.g., [low, med, high]).
LiteralExpression for literal values (numbers, strings, etc.).
ParseErrorError raised during formula parsing.
ParserParse a sequence of Tokens and return an abstract syntax tree.
QuotedNameExpression for back-quoted names (e.g., weird column name!).
ScanErrorError raised during formula scanning.
ScannerScan formula string and return Tokens.
TokenRepresentation of a single Token.
UnaryExpression for unary operations (e.g., -x, +x).
VariableExpression for variable references.

Classes

Assign
Assign(name: 'Variable', value: object) -> None

Expression for assignments (e.g., x=value in function calls).

Attributes:

NameTypeDescription
name
value
Attributes
name
name = name
value
value = value
Binary
Binary(left: object, operator: Token, right: object) -> None

Expression for binary operations (e.g., x + y, x ~ y).

Attributes:

NameTypeDescription
left
operator
right
Attributes
left
left = left
operator
operator = operator
right = right
Call
Call(callee: object, args: list) -> None

Expression for function calls (e.g., factor(x), center(y)).

Attributes:

NameTypeDescription
args
callee
Attributes
args
args = args
callee
callee = callee
Grouping
Grouping(expression: object) -> None

Expression for parenthesized groups.

Attributes:

NameTypeDescription
expression
Attributes
expression
expression = expression
ListExpr
ListExpr(elements: list[object]) -> None

Expression for bracket list literals (e.g., [low, med, high]).

Used for level ordering in contrast functions like helmert(x, [low, med, high]).

Attributes:

NameTypeDescription
elements
Attributes
elements
elements = elements
Literal
Literal(value: object, lexeme: str | None = None) -> None

Expression for literal values (numbers, strings, etc.).

Attributes:

NameTypeDescription
lexeme
value
Attributes
lexeme
lexeme = lexeme
value
value = value
ParseError

Bases: Exception

Error raised during formula parsing.

Parser
Parser(tokens: list[Token], formula: str = '') -> None

Parse a sequence of Tokens and return an abstract syntax tree.

Parameters:

NameTypeDescriptionDefault
tokenslist[Token]A list of Token objects as returned by Scanner.scan().required
formulastrThe original formula string (for error messages).‘’

Functions:

NameDescription
addition
advance
assignment
at_end
call
checkCheck if current token matches any of the given types.
comparison
consumeConsume the next Token, raising ParseError if it doesn’t match.
expression
finish_call
format_error_contextFormat a parse error with visual pointer to the error location.
interaction
matchMatch and consume token if it matches any of the given types.
multiple_interaction
multiplication
parseParse a sequence of Tokens.
peekReturn the Token we are about to consume.
previousReturn the last Token we consumed.
primary
random_effect
tilde
unary

Attributes:

NameTypeDescription
current
formula
tokens
Attributes
current
current = 0
formula
formula = formula
tokens
tokens = tokens
Functions
addition
addition() -> object
advance
advance() -> Token | None
assignment
assignment() -> object
at_end
at_end() -> bool
call
call() -> object
check
check(types: str | list[str]) -> bool

Check if current token matches any of the given types.

comparison
comparison() -> object
consume
consume(kind: str, message: str) -> Token

Consume the next Token, raising ParseError if it doesn’t match.

Parameters:

NameTypeDescriptionDefault
kindstrExpected token kind.required
messagestrError message if token doesn’t match.required

Returns:

TypeDescription
TokenThe consumed token.
expression
expression() -> object
finish_call
finish_call(expr: object) -> Call
format_error_context
format_error_context(position: int, message: str) -> str

Format a parse error with visual pointer to the error location.

Parameters:

NameTypeDescriptionDefault
positionintCharacter offset where error occurred.required
messagestrThe error description.required

Returns:

TypeDescription
strFormatted error message with context and pointer.
interaction
interaction() -> object
match
match(types: str | list[str]) -> bool

Match and consume token if it matches any of the given types.

multiple_interaction
multiple_interaction() -> object
multiplication
multiplication() -> object
parse
parse() -> object

Parse a sequence of Tokens.

Returns:

TypeDescription
objectAn AST expression node representing the parsed formula.
peek
peek() -> Token

Return the Token we are about to consume.

previous
previous() -> Token

Return the last Token we consumed.

primary
primary() -> object
random_effect
random_effect() -> object
tilde
tilde() -> object
unary
unary() -> object
QuotedName
QuotedName(expression: Token) -> None

Expression for back-quoted names (e.g., weird column name!).

Attributes:

NameTypeDescription
expression
Attributes
expression
expression = expression
ScanError

Bases: Exception

Error raised during formula scanning.

Scanner
Scanner(code: str) -> None

Scan formula string and return Tokens.

Parameters:

NameTypeDescriptionDefault
codestrThe formula string to scan.required

Functions:

NameDescription
add_token
advance
at_end
backquote
char
floatnum
identifier
match
number
peek
peek_next
scanScan formula string.
scan_token

Attributes:

NameTypeDescription
code
current
start
tokenslist[Token]
Attributes
code
code = code
current
current = 0
start
start = 0
tokens
tokens: list[Token] = []
Functions
add_token
add_token(kind: str, literal: object = None) -> None
advance
advance() -> str
at_end
at_end() -> bool
backquote
backquote() -> None
char
char() -> None
floatnum
floatnum() -> None
identifier
identifier() -> None
match
match(expected: str) -> bool
number
number() -> None
peek
peek() -> str
peek_next
peek_next() -> str
scan
scan(add_intercept: bool = True) -> list[Token]

Scan formula string.

Parameters:

NameTypeDescriptionDefault
add_interceptboolWhether to add an implicit intercept. Defaults to True.True

Returns:

TypeDescription
list[Token]A list of Token objects.
scan_token
scan_token() -> None
Token
Token(kind: str, lexeme: str, literal: object = None, position: int = 0) -> None

Representation of a single Token.

Attributes:

NameTypeDescription
kindToken type (e.g., “IDENTIFIER”, “PLUS”, “TILDE”).
lexemeThe actual string from the source.
literalParsed literal value (for numbers, strings).
positionCharacter offset in the original formula string.
Attributes
kind
kind = kind
lexeme
lexeme = lexeme
literal
literal = literal
position
position = position
Unary
Unary(operator: Token, right: object) -> None

Expression for unary operations (e.g., -x, +x).

Attributes:

NameTypeDescription
operator
right
Attributes
operator
operator = operator
right
right = right
Variable
Variable(name: Token, level: 'Literal | None' = None) -> None

Expression for variable references.

Attributes:

NameTypeDescription
level
name
Attributes
level
level = level
name
name = name

Modules

expr

AST expression node types for formula parsing.

Vendored from formulae library (https://github.com/bambinos/formulae).

Classes:

NameDescription
AssignExpression for assignments (e.g., x=value in function calls).
BinaryExpression for binary operations (e.g., x + y, x ~ y).
CallExpression for function calls (e.g., factor(x), center(y)).
GroupingExpression for parenthesized groups.
ListExprExpression for bracket list literals (e.g., [low, med, high]).
LiteralExpression for literal values (numbers, strings, etc.).
QuotedNameExpression for back-quoted names (e.g., weird column name!).
UnaryExpression for unary operations (e.g., -x, +x).
VariableExpression for variable references.
Classes
Assign
Assign(name: 'Variable', value: object) -> None

Expression for assignments (e.g., x=value in function calls).

Attributes:

NameTypeDescription
name
value
Attributes
name
name = name
value
value = value
Binary
Binary(left: object, operator: Token, right: object) -> None

Expression for binary operations (e.g., x + y, x ~ y).

Attributes:

NameTypeDescription
left
operator
right
Attributes
left
left = left
operator
operator = operator
right
right = right
Call
Call(callee: object, args: list) -> None

Expression for function calls (e.g., factor(x), center(y)).

Attributes:

NameTypeDescription
args
callee
Attributes
args
args = args
callee
callee = callee
Grouping
Grouping(expression: object) -> None

Expression for parenthesized groups.

Attributes:

NameTypeDescription
expression
Attributes
expression
expression = expression
ListExpr
ListExpr(elements: list[object]) -> None

Expression for bracket list literals (e.g., [low, med, high]).

Used for level ordering in contrast functions like helmert(x, [low, med, high]).

Attributes:

NameTypeDescription
elements
Attributes
elements
elements = elements
Literal
Literal(value: object, lexeme: str | None = None) -> None

Expression for literal values (numbers, strings, etc.).

Attributes:

NameTypeDescription
lexeme
value
Attributes
lexeme
lexeme = lexeme
value
value = value
QuotedName
QuotedName(expression: Token) -> None

Expression for back-quoted names (e.g., weird column name!).

Attributes:

NameTypeDescription
expression
Attributes
expression
expression = expression
Unary
Unary(operator: Token, right: object) -> None

Expression for unary operations (e.g., -x, +x).

Attributes:

NameTypeDescription
operator
right
Attributes
operator
operator = operator
right
right = right
Variable
Variable(name: Token, level: 'Literal | None' = None) -> None

Expression for variable references.

Attributes:

NameTypeDescription
level
name
Attributes
level
level = level
name
name = name
parser

Recursive descent parser for formula strings.

Vendored from formulae library (https://github.com/bambinos/formulae).

Classes:

NameDescription
ParseErrorError raised during formula parsing.
ParserParse a sequence of Tokens and return an abstract syntax tree.

Functions:

NameDescription
listifyWrap non-list objects in a list.
Classes
ParseError

Bases: Exception

Error raised during formula parsing.

Parser
Parser(tokens: list[Token], formula: str = '') -> None

Parse a sequence of Tokens and return an abstract syntax tree.

Parameters:

NameTypeDescriptionDefault
tokenslist[Token]A list of Token objects as returned by Scanner.scan().required
formulastrThe original formula string (for error messages).‘’

Functions:

NameDescription
addition
advance
assignment
at_end
call
checkCheck if current token matches any of the given types.
comparison
consumeConsume the next Token, raising ParseError if it doesn’t match.
expression
finish_call
format_error_contextFormat a parse error with visual pointer to the error location.
interaction
matchMatch and consume token if it matches any of the given types.
multiple_interaction
multiplication
parseParse a sequence of Tokens.
peekReturn the Token we are about to consume.
previousReturn the last Token we consumed.
primary
random_effect
tilde
unary

Attributes:

NameTypeDescription
current
formula
tokens
Attributes
current
current = 0
formula
formula = formula
tokens
tokens = tokens
Functions
addition
addition() -> object
advance
advance() -> Token | None
assignment
assignment() -> object
at_end
at_end() -> bool
call
call() -> object
check
check(types: str | list[str]) -> bool

Check if current token matches any of the given types.

comparison
comparison() -> object
consume
consume(kind: str, message: str) -> Token

Consume the next Token, raising ParseError if it doesn’t match.

Parameters:

NameTypeDescriptionDefault
kindstrExpected token kind.required
messagestrError message if token doesn’t match.required

Returns:

TypeDescription
TokenThe consumed token.
expression
expression() -> object
finish_call
finish_call(expr: object) -> Call
format_error_context
format_error_context(position: int, message: str) -> str

Format a parse error with visual pointer to the error location.

Parameters:

NameTypeDescriptionDefault
positionintCharacter offset where error occurred.required
messagestrThe error description.required

Returns:

TypeDescription
strFormatted error message with context and pointer.
interaction
interaction() -> object
match
match(types: str | list[str]) -> bool

Match and consume token if it matches any of the given types.

multiple_interaction
multiple_interaction() -> object
multiplication
multiplication() -> object
parse
parse() -> object

Parse a sequence of Tokens.

Returns:

TypeDescription
objectAn AST expression node representing the parsed formula.
peek
peek() -> Token

Return the Token we are about to consume.

previous
previous() -> Token

Return the last Token we consumed.

primary
primary() -> object
random_effect
random_effect() -> object
tilde
tilde() -> object
unary
unary() -> object
Functions
listify
listify(obj: str | list[str] | None) -> list[str]

Wrap non-list objects in a list.

scanner

Formula string scanner/tokenizer.

Vendored from formulae library (https://github.com/bambinos/formulae).

Classes:

NameDescription
ScanErrorError raised during formula scanning.
ScannerScan formula string and return Tokens.

Functions:

NameDescription
format_error_contextFormat a scan error with visual pointer to the error location.
Classes
ScanError

Bases: Exception

Error raised during formula scanning.

Scanner
Scanner(code: str) -> None

Scan formula string and return Tokens.

Parameters:

NameTypeDescriptionDefault
codestrThe formula string to scan.required

Functions:

NameDescription
add_token
advance
at_end
backquote
char
floatnum
identifier
match
number
peek
peek_next
scanScan formula string.
scan_token

Attributes:

NameTypeDescription
code
current
start
tokenslist[Token]
Attributes
code
code = code
current
current = 0
start
start = 0
tokens
tokens: list[Token] = []
Functions
add_token
add_token(kind: str, literal: object = None) -> None
advance
advance() -> str
at_end
at_end() -> bool
backquote
backquote() -> None
char
char() -> None
floatnum
floatnum() -> None
identifier
identifier() -> None
match
match(expected: str) -> bool
number
number() -> None
peek
peek() -> str
peek_next
peek_next() -> str
scan
scan(add_intercept: bool = True) -> list[Token]

Scan formula string.

Parameters:

NameTypeDescriptionDefault
add_interceptboolWhether to add an implicit intercept. Defaults to True.True

Returns:

TypeDescription
list[Token]A list of Token objects.
scan_token
scan_token() -> None
Functions
format_error_context
format_error_context(formula: str, position: int, message: str) -> str

Format a scan error with visual pointer to the error location.

Parameters:

NameTypeDescriptionDefault
formulastrThe original formula string.required
positionintCharacter offset where error occurred.required
messagestrThe error description.required

Returns:

TypeDescription
strFormatted error message with context and pointer.
token

Token class for formula parsing.

Vendored from formulae library (https://github.com/bambinos/formulae).

Classes:

NameDescription
TokenRepresentation of a single Token.
Classes
Token
Token(kind: str, lexeme: str, literal: object = None, position: int = 0) -> None

Representation of a single Token.

Attributes:

NameTypeDescription
kindToken type (e.g., “IDENTIFIER”, “PLUS”, “TILDE”).
lexemeThe actual string from the source.
literalParsed literal value (for numbers, strings).
positionCharacter offset in the original formula string.
Attributes
kind
kind = kind
lexeme
lexeme = lexeme
literal
literal = literal
position
position = position

random_effects

Random effects Z matrix construction from FormulaSpec.

Functions:

NameDescription
build_random_effects_from_specBuild random effects design matrix from FormulaSpec.

Classes

Functions

build_random_effects_from_spec
build_random_effects_from_spec(spec: FormulaSpec, data: pl.DataFrame) -> RandomEffectsInfo | None

Build random effects design matrix from FormulaSpec.

Parameters:

NameTypeDescriptionDefault
specFormulaSpecParsed formula specification with re_terms.required
dataDataFrameTraining data (Polars DataFrame).required

Returns:

TypeDescription
RandomEffectsInfo | NoneRandomEffectsInfo with Z matrix and metadata, or None if no RE terms.