formula - bossanova

Formula parsing, design matrix construction, and newdata evaluation.

Call chain:

model(formula, data) -> parse_formula() -> build_bundle_from_data() -> build_design_matrices()
model.predict(newdata=) -> evaluate_newdata() (applies learned encoding to new observations)

Classes:

Name	Description
`DesignResult`	Output of build_design_matrices(). Separates arrays from metadata.
`FormulaError`	Exception raised for formula parsing errors.
`TermResult`	Result of evaluating one formula term.

Functions:

Name	Description
`build_design_matrices`	Build X and y matrices from a parsed formula spec.
`build_random_effects_from_spec`	Build random effects design matrix from FormulaSpec.
`expand_double_verts`	Expand
`expand_nested_syntax`	Expand nested / syntax into separate crossed random effects terms.
`parse_formula`	Parse formula and detect categoricals from data.

Modules:

Name	Description
`bundle`	Data bundle construction from formula and DataFrame.
`contrast_registry`	Contrast function registry for explore formulas.
`contrast_specs`	Contrast specification resolution for design matrix coding.
`design`	Design matrix construction from FormulaSpec.
`encoding`	Categorical variable encoding using Polars Enum.
`evaluate`	Term evaluation — AST nodes to design matrix columns.
`evaluate_contrast`	Contrast function evaluation for formula-syntax categorical encoding.
`evaluate_newdata`	Newdata evaluation — apply learned encoding to new observations.
`evaluate_transforms`	Transform evaluation — stateful, math, and polynomial transforms.
`helpers`	Shared AST utilities for formula operations.
`parse`	R-style formula string parsing into FormulaSpec containers.
`parser`	Recursive descent parser for statistical formula strings.
`random_effects`	Random effects Z matrix construction from FormulaSpec.

Classes¶

DesignResult¶

Output of build_design_matrices(). Separates arrays from metadata.

Attributes:

Name	Type	Description
`X`	`NDArray[float64]`	Fixed effects design matrix of shape (n_obs, n_features).
`X_labels`	`tuple[str, ...]`	Column names for X matrix.
`y`	`NDArray[float64] \| None`	Response vector of shape (n_obs,), or None.
`y_label`	`str \| None`	Name of response variable, or None.

Attributes¶

X¶

X: NDArray[np.float64]

X_labels¶

X_labels: tuple[str, ...]

n_obs¶

n_obs: int

Number of observations.

y¶

y: NDArray[np.float64] | None = None

y_label¶

y_label: str | None = None

FormulaError¶

FormulaError(message: str, formula: str | None = None, position: int | None = None) -> None

Bases: ValueError

Exception raised for formula parsing errors.

Provides helpful error messages with pointer to error position.

Parameters:

Name	Type	Description	Default
`message`	`str`	Error description.	required
`formula`	`str \| None`	The formula that caused the error.	`None`
`position`	`int \| None`	Character position of the error (optional).	`None`

Attributes:

Name	Type	Description
`formula`
`position`

Attributes¶

formula¶

formula = formula

position¶

position = position

TermResult¶

Result of evaluating one formula term.

Attributes:

Name	Type	Description
`columns`	`NDArray[float64]`	Data array, shape (n_obs,) or (n_obs, k).
`labels`	`list[str]`	Column names for the result.
`state_updates`	`dict`	Partial state to merge back into accumulators. May contain keys: “factors”, “contrast_matrices”, “contrast_types”, “transform_state”, “transforms”.

Attributes¶

columns¶

columns: NDArray[np.float64]

labels¶

labels: list[str]

state_updates¶

state_updates: dict

Functions¶

build_design_matrices¶

build_design_matrices(spec: FormulaSpec, data: pl.DataFrame) -> tuple[DesignResult, FormulaSpec]

Build X and y matrices from a parsed formula spec.

Evaluates terms, learns encoding (contrasts, transforms), and returns both the design matrices AND an updated FormulaSpec with learned state.

The returned FormulaSpec has contrast_matrices, contrast_types, and transform_state populated (they may be empty in the input spec from parse_formula if this is the first build).

Parameters:

Name	Type	Description	Default
`spec`	`FormulaSpec`	FormulaSpec from parse_formula().	required
`data`	`DataFrame`	Polars DataFrame with training data.	required

Returns:

Type	Description
`tuple[DesignResult, FormulaSpec]`	(DesignResult, FormulaSpec) — matrices + updated spec with learned encoding.

build_random_effects_from_spec¶

build_random_effects_from_spec(spec: FormulaSpec, data: pl.DataFrame) -> RandomEffectsInfo | None

Build random effects design matrix from FormulaSpec.

Parameters:

Name	Type	Description	Default
`spec`	`FormulaSpec`	Parsed formula specification with re_terms.	required
`data`	`DataFrame`	Training data (Polars DataFrame).	required

Returns:

Type	Description
`RandomEffectsInfo \| None`	RandomEffectsInfo with Z matrix and metadata, or None if no RE terms.

expand_double_verts¶

expand_double_verts(formula: str) -> tuple[str, dict]

Expand || syntax into separate uncorrelated random effects terms.

This matches lme4’s expandDoubleVerts() function. The || syntax creates independent (uncorrelated) random effects by expanding to separate terms.

Parameters:

Name	Type	Description	Default
`formula`	`str`	R-style formula string potentially containing

Returns:

Type	Description
`tuple[str, dict]`	A tuple of: - Expanded formula string with

Examples:

>>> expand_double_verts("y ~ x + (Days || Subject)")
('y ~ x + (1 | Subject) + (0 + Days | Subject)', {...})

>>> expand_double_verts("y ~ x + (1 + x + y || group)")
('y ~ x + (1 | group) + (0 + x | group) + (0 + y | group)', {...})

Note: The transformation rules are:

(x || g) -> (1 | g) + (0 + x | g)
(1 + x || g) -> (1 | g) + (0 + x | g)
(1 + x + y || g) -> (1 | g) + (0 + x | g) + (0 + y | g)
(0 + x || g) -> (0 + x | g) [no intercept term added]

expand_nested_syntax¶

expand_nested_syntax(formula: str) -> tuple[str, dict]

Expand nested / syntax into separate crossed random effects terms.

This matches lme4’s behavior where nested syntax (a/b) is syntactic sugar for separate terms: (1|a/b) expands to (1|a) + (1|a:b).

The : in the grouping factor creates an interaction grouping where each unique combination of levels becomes a separate group.

Parameters:

Name	Type	Description	Default
`formula`	`str`	R-style formula string potentially containing / in RE terms.	required

Returns:

Type	Description
`tuple[str, dict]`	A tuple of: - Expanded formula string with / replaced by separate terms - Metadata dict tracking which terms came from / expansion

Examples:

>>> expand_nested_syntax("y ~ x + (1|school/class)")
('y ~ x + (1|school) + (1|school:class)', {...})

>>> expand_nested_syntax("y ~ x + (1|a/b/c)")
('y ~ x + (1|a) + (1|a:b) + (1|a:b:c)', {...})

>>> expand_nested_syntax("y ~ x + (Days|Subject/Session)")
('y ~ x + (Days|Subject) + (Days|Subject:Session)', {...})

Note: The transformation rules are:

(1|a/b) -> (1|a) + (1|a:b)
(1|a/b/c) -> (1|a) + (1|a:b) + (1|a:b:c)
(x|a/b) -> (x|a) + (x|a:b)
(1 + x|a/b) -> (1 + x|a) + (1 + x|a:b)

parse_formula¶

parse_formula(formula: str, data: pl.DataFrame, *, factors: dict[str, list[str]] | None = None, custom_contrasts: dict[str, NDArray] | None = None) -> FormulaSpec

Parse formula and detect categoricals from data.

This does parsing + categorical detection but NOT matrix construction. Reuses parser/ for tokenization and AST construction.

Parameters:

Name	Type	Description	Default
`formula`	`str`	R-style formula string (e.g., “y ~ x + z”).	required
`data`	`DataFrame`	Polars DataFrame to detect categoricals from.	required
`factors`	`dict[str, list[str]] \| None`	Optional dict mapping column names to level orderings. If provided, these orderings are used for categorical encoding.	`None`
`custom_contrasts`	`dict[str, NDArray] \| None`	Optional dict of user-provided contrast matrices.	`None`

Returns:

Type	Description
`FormulaSpec`	FormulaSpec with parsed terms and detected factor levels.

Modules¶

bundle¶

Data bundle construction from formula and DataFrame.

Orchestrates formula parsing, design matrix construction, missing value handling, weight validation, rank deficiency detection, and random effects metadata to produce a DataBundle. Extracted from model/core.py.

Functions:

Name	Description
`build_bundle_from_data`	Build a DataBundle and learned FormulaSpec from a model spec and data.
`filter_valid_rows`	Filter a DataFrame to only valid (non-NA) rows using a boolean mask.

Classes¶

Functions¶

build_bundle_from_data¶

build_bundle_from_data(*, spec: ModelSpec, formula: str, data: pl.DataFrame, custom_contrasts: dict[str, np.ndarray] | None, weights_col: str | None, offset_col: str | None = None, missing: str) -> tuple[DataBundle, FormulaSpec]

Build a DataBundle and learned FormulaSpec from a model spec and data.

Handles the full pipeline: formula parsing, design matrix construction, missing value handling, weight validation, offset extraction, rank deficiency detection, family-specific response validation, and random effects metadata.

Parameters:

Name	Type	Description	Default
`spec`	`ModelSpec`	Model specification with parsed formula info.	required
`formula`	`str`	Raw formula string (e.g. ``"y ~ x + (1	group)"``).
`data`	`DataFrame`	Input data as a Polars DataFrame.	required
`custom_contrasts`	`dict[str, ndarray] \| None`	User-specified contrast matrices, or None.	required
`weights_col`	`str \| None`	Name of the weights column in data, or None.	required
`offset_col`	`str \| None`	Name of the offset column in data, or None.	`None`
`missing`	`str`	How to handle missing values (`"drop"` or `"fail"`).	required

Returns:

Type	Description
`DataBundle`	Tuple of (DataBundle, FormulaSpec). The FormulaSpec is needed
`FormulaSpec`	for consistent newdata evaluation via `evaluate_newdata()`.

filter_valid_rows¶

filter_valid_rows(data: pl.DataFrame | None, valid_mask: np.ndarray | None) -> pl.DataFrame | None

Filter a DataFrame to only valid (non-NA) rows using a boolean mask.

Returns the data unchanged if no filtering is needed (data is None, mask is None, or all rows are valid).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame \| None`	Polars DataFrame to filter, or None.	required
`valid_mask`	`ndarray \| None`	Boolean array indicating valid rows, or None.	required

Returns:

Type	Description
`DataFrame \| None`	Filtered DataFrame, or None if data was None.

contrast_registry¶

Contrast function registry for explore formulas.

Centralizes the vocabulary of contrast function names, aliases, and parameter requirements. Used by the explore parser and contrast dispatch logic to ensure consistent naming.

Functions:

Name	Description
`resolve_contrast_name`	Resolve a contrast function name to its canonical form.

Attributes:

Name	Type	Description
`CONTRAST_ALIASES`	`dict[str, str]`
`DEGREE_FUNCTIONS`	`frozenset[str]`
`MODEL_CONTRAST_FUNCTIONS`	`frozenset[str]`
`OMIT_FUNCTIONS`	`frozenset[str]`
`ORDER_DEPENDENT`	`frozenset[str]`
`REF_FUNCTIONS`	`frozenset[str]`
`VALID_CONTRAST_FUNCTIONS`	`frozenset[str]`

Attributes¶

CONTRAST_ALIASES¶

CONTRAST_ALIASES: dict[str, str] = {'pairwise': 'pairwise', 'sequential': 'sequential', 'poly': 'poly', 'treatment': 'treatment', 'dummy': 'treatment', 'sum': 'sum', 'deviation': 'sum', 'helmert': 'helmert'}

DEGREE_FUNCTIONS¶

DEGREE_FUNCTIONS: frozenset[str] = frozenset({'poly'})

MODEL_CONTRAST_FUNCTIONS¶

MODEL_CONTRAST_FUNCTIONS: frozenset[str] = frozenset({k for k, v in (CONTRAST_ALIASES.items()) if v != 'pairwise'})

OMIT_FUNCTIONS¶

OMIT_FUNCTIONS: frozenset[str] = frozenset({'sum', 'deviation'})

ORDER_DEPENDENT¶

ORDER_DEPENDENT: frozenset[str] = frozenset({'sequential', 'poly', 'helmert'})

REF_FUNCTIONS¶

REF_FUNCTIONS: frozenset[str] = frozenset({'treatment', 'dummy'})

VALID_CONTRAST_FUNCTIONS¶

VALID_CONTRAST_FUNCTIONS: frozenset[str] = frozenset(CONTRAST_ALIASES.keys())

Functions¶

resolve_contrast_name¶

resolve_contrast_name(name: str) -> str

Resolve a contrast function name to its canonical form.

Parameters:

Name	Type	Description	Default
`name`	`str`	Contrast function name (may be an alias).	required

Returns:

Type	Description
`str`	Canonical contrast name.

contrast_specs¶

Contrast specification resolution for design matrix coding.

Resolves user-facing contrast specs (strings, tuples, ndarrays) into concrete contrast matrices. This is pure validation + dispatch logic extracted from the model class.

resolve_contrast_specs: Validate and resolve contrast specifications resolve_contrast_specs: Validate and resolve contrast specifications

Functions:

Name	Description
`resolve_contrast_specs`	Resolve user-facing contrast specs into concrete contrast matrices.
`validate_constructor_contrasts`	Validate the `contrasts=` kwarg passed to the model constructor.

Attributes¶

Functions¶

resolve_contrast_specs¶

resolve_contrast_specs(data: pl.DataFrame, contrasts: dict[str, object]) -> dict[str, NDArray[np.float64]]

Resolve user-facing contrast specs into concrete contrast matrices.

Takes the raw contrast specifications provided by the user (strings, tuples, ndarrays) and validates them against the data, returning a dict mapping column names to contrast matrices.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Polars DataFrame containing the model data. Used to validate column existence and extract factor levels.	required
`contrasts`	`dict[str, object]`	Mapping of column names to contrast specifications. Each value can be: - A string: `'treatment'`, `'sum'`, `'helmert'`, `'poly'`, or `'sequential'` - A tuple: `('treatment', 'B')` for treatment coding with ‘B’ as reference, or `('sum', 'A')` for sum coding omitting ‘A’ - An ndarray: Custom contrast matrix of shape `(n_levels, n_levels - 1)`	required

Returns:

Type	Description
`dict[str, NDArray[float64]]`	Dict mapping column names to contrast matrices (each of shape
`dict[str, NDArray[float64]]`	`(n_levels, n_levels - 1)`).

validate_constructor_contrasts¶

validate_constructor_contrasts(contrasts: dict, data: pl.DataFrame | None) -> None

Validate the contrasts= kwarg passed to the model constructor.

Each value must be an ndarray of shape (n_levels, n_levels - 1) where n_levels is the number of unique values in the column.

Parameters:

Name	Type	Description	Default
`contrasts`	`dict`	Dict mapping column names to ndarray contrast matrices.	required
`data`	`DataFrame \| None`	The model’s data (may be None for simulation-first).	required

design¶

Design matrix construction from FormulaSpec.

Classes:

Name	Description
`DesignResult`	Output of build_design_matrices(). Separates arrays from metadata.

Functions:

Name	Description
`build_design_matrices`	Build X and y matrices from a parsed formula spec.

Attributes¶

Classes¶

DesignResult¶

Output of build_design_matrices(). Separates arrays from metadata.

Attributes:

Name	Type	Description
`X`	`NDArray[float64]`	Fixed effects design matrix of shape (n_obs, n_features).
`X_labels`	`tuple[str, ...]`	Column names for X matrix.
`y`	`NDArray[float64] \| None`	Response vector of shape (n_obs,), or None.
`y_label`	`str \| None`	Name of response variable, or None.

Attributes¶

X¶

X: NDArray[np.float64]

X_labels¶

X_labels: tuple[str, ...]

n_obs¶

n_obs: int

Number of observations.

y¶

y: NDArray[np.float64] | None = None

y_label¶

y_label: str | None = None

Functions¶

build_design_matrices¶

build_design_matrices(spec: FormulaSpec, data: pl.DataFrame) -> tuple[DesignResult, FormulaSpec]

Build X and y matrices from a parsed formula spec.

Evaluates terms, learns encoding (contrasts, transforms), and returns both the design matrices AND an updated FormulaSpec with learned state.

The returned FormulaSpec has contrast_matrices, contrast_types, and transform_state populated (they may be empty in the input spec from parse_formula if this is the first build).

Parameters:

Name	Type	Description	Default
`spec`	`FormulaSpec`	FormulaSpec from parse_formula().	required
`data`	`DataFrame`	Polars DataFrame with training data.	required

Returns:

Type	Description
`tuple[DesignResult, FormulaSpec]`	(DesignResult, FormulaSpec) — matrices + updated spec with learned encoding.

encoding¶

Categorical variable encoding using Polars Enum.

Functions:

Name	Description
`detect_categoricals`	Detect categorical variables from a formula AST.
`detect_levels`	Infer level ordering from a non-categorical series.
`encode_categorical`	Encode a categorical series using a contrast matrix.
`ensure_enum`	Convert columns to Enum type with specified level ordering.
`get_levels`	Get the level ordering from an Enum or Categorical series.

Classes¶

Functions¶

detect_categoricals¶

detect_categoricals(ast: Binary | Call | Variable | object, data: pl.DataFrame) -> dict[str, list[str]]

Detect categorical variables from a formula AST.

Walks the AST to find:

Explicit categorical markers: factor(x), T(x), S(x)
String columns referenced in the formula

For explicit markers, extracts level information if provided. For implicit string columns, infers levels from data.

Parameters:

Name	Type	Description	Default
`ast`	`Binary \| Call \| Variable \| object`	Parsed formula AST (from parser).	required
`data`	`DataFrame`	DataFrame to check column types against.	required

Returns:

Type	Description
`dict[str, list[str]]`	Dict mapping column names to ordered level lists.

Examples:

>>> from parser import Scanner, Parser
>>> import polars as pl
>>> tokens = Scanner('y ~ factor(group) + age').scan()
>>> ast = Parser(tokens).parse()
>>> df = pl.DataFrame({'y': [1, 2], 'group': ['A', 'B'], 'age': [30, 40]})
>>> detect_categoricals(ast, df)
{'group': ['A', 'B']}

detect_levels¶

detect_levels(series: pl.Series) -> list[str]

Infer level ordering from a non-categorical series.

For string columns that haven’t been converted to Enum yet, infers levels by getting unique values and sorting them.

For numeric columns used with factor(), formats integer values without decimal points (e.g., 6.0 becomes “6”).

Parameters:

Name	Type	Description	Default
`series`	`Series`	Polars series (any type).	required

Returns:

Type	Description
`list[str]`	List of unique values, sorted alphabetically/numerically.

Examples:

>>> import polars as pl
>>> s = pl.Series('x', ['B', 'A', 'C', 'A'])
>>> detect_levels(s)
['A', 'B', 'C']
>>> s = pl.Series('x', [6.0, 4.0, 8.0, 4.0])
>>> detect_levels(s)
['4', '6', '8']

encode_categorical¶

encode_categorical(series: pl.Series, contrast: NDArray[np.float64]) -> NDArray[np.float64]

Encode a categorical series using a contrast matrix.

Takes a Polars series (must be Enum or Categorical type) and applies contrast encoding by indexing into the contrast matrix with the integer codes.

Parameters:

Name	Type	Description	Default
`series`	`Series`	Polars series with Enum or Categorical dtype.	required
`contrast`	`NDArray[float64]`	Contrast matrix of shape (n_levels, n_columns). Row order must match the series’ category order.	required

Returns:

Type	Description
`NDArray[float64]`	Encoded array of shape (n_obs, n_columns).

Examples:

>>> import polars as pl
>>> from coding import treatment_coding
>>> series = pl.Series('x', ['B', 'A', 'C']).cast(pl.Enum(['A', 'B', 'C']))
>>> contrast = treatment_coding(['A', 'B', 'C'])
>>> encode_categorical(series, contrast)
array([[1., 0.],
       [0., 0.],
       [0., 1.]])

ensure_enum¶

ensure_enum(data: pl.DataFrame, factors: dict[str, list[str]]) -> pl.DataFrame

Convert columns to Enum type with specified level ordering.

This function converts string/categorical columns to Polars Enum type, ensuring consistent level ordering. If a column is already an Enum with matching levels, it is left unchanged.

For numeric columns, applies the same formatting as detect_levels() to ensure consistency (e.g., 6.0 -> “6”).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	DataFrame to modify.	required
`factors`	`dict[str, list[str]]`	Dict mapping column names to ordered level lists. Level order determines reference category (first = reference).	required

Returns:

Type	Description
`DataFrame`	DataFrame with specified columns converted to Enum type.

Examples:

>>> import polars as pl
>>> df = pl.DataFrame({'group': ['B', 'A', 'C', 'A', 'B']})
>>> df = ensure_enum(df, {'group': ['A', 'B', 'C']})
>>> df['group'].dtype
Enum(categories=['A', 'B', 'C'])

get_levels¶

get_levels(series: pl.Series) -> list[str]

Get the level ordering from an Enum or Categorical series.

Parameters:

Name	Type	Description	Default
`series`	`Series`	Polars series with Enum or Categorical dtype.	required

Returns:

Type	Description
`list[str]`	List of category levels in their defined order.

Examples:

>>> import polars as pl
>>> s = pl.Series('x', ['B', 'A']).cast(pl.Enum(['A', 'B', 'C']))
>>> get_levels(s)
['A', 'B', 'C']

evaluate¶

Term evaluation — AST nodes to design matrix columns.

Classes:

Name	Description
`TermResult`	Result of evaluating one formula term.

Functions:

Name	Description
`evaluate_call`	Evaluate a function call term.
`evaluate_categorical`	Evaluate a categorical variable with contrast encoding.
`evaluate_interaction`	Evaluate an interaction term (a:b).
`evaluate_star`	Evaluate a * term (main effects + interaction): a * b = a + b + a:b.
`evaluate_term`	Evaluate a single formula term against data.
`evaluate_variable`	Evaluate a simple variable reference.

Attributes¶

Classes¶

TermResult¶

Result of evaluating one formula term.

Attributes:

Name	Type	Description
`columns`	`NDArray[float64]`	Data array, shape (n_obs,) or (n_obs, k).
`labels`	`list[str]`	Column names for the result.
`state_updates`	`dict`	Partial state to merge back into accumulators. May contain keys: “factors”, “contrast_matrices”, “contrast_types”, “transform_state”, “transforms”.

Attributes¶

columns¶

columns: NDArray[np.float64]

labels¶

labels: list[str]

state_updates¶

state_updates: dict

Functions¶

evaluate_call¶

evaluate_call(call: Call, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> tuple[TermResult, bool]

Evaluate a function call term.

Handles: contrast functions, transform, center/norm/zscore/scale, log/log10/sqrt.

Parameters:

Name	Type	Description	Default
`call`	`Call`	Call AST node.	required
`data`	`DataFrame`	Polars DataFrame.	required
`factors`	`dict[str, list[str]]`	Current factor levels mapping.	required
`contrast_matrices`	`dict[str, NDArray]`	Current contrast matrices.	required
`transform_state`	`dict[str, dict]`	Current transform state mapping.	required
`transforms`	`dict[str, object]`	Current fitted transform instances.	required
`custom_contrasts`	`dict[str, NDArray]`	User-provided contrast matrices.	required
`intercept_absorbed`	`bool`	Whether intercept df has been absorbed.	required

Returns:

Type	Description
`tuple[TermResult, bool]`	(TermResult, updated_intercept_absorbed).

evaluate_categorical¶

evaluate_categorical(name: str, series: pl.Series, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool, *, spans_intercept: bool | None = None, contrast_type: str | None = None) -> TermResult

Evaluate a categorical variable with contrast encoding.

Parameters:

Name	Type	Description	Default
`name`	`str`	Variable name.	required
`series`	`Series`	Polars series with categorical data.	required
`factors`	`dict[str, list[str]]`	Current factor levels mapping.	required
`contrast_matrices`	`dict[str, NDArray]`	Current contrast matrices.	required
`custom_contrasts`	`dict[str, NDArray]`	User-provided contrast matrices.	required
`intercept_absorbed`	`bool`	Whether intercept df has been absorbed.	required
`spans_intercept`	`bool \| None`	Whether this categorical should span the intercept. If None, determined automatically.	`None`
`contrast_type`	`str \| None`	Type of contrast to use. If None, defaults to “treatment”.	`None`

Returns:

Type	Description
`TermResult`	TermResult with encoded data and state updates.

evaluate_interaction¶

evaluate_interaction(term: Binary, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool, *, is_toplevel: bool = True) -> tuple[TermResult, bool]

Evaluate an interaction term (a:b).

Parameters:

Name	Type	Description	Default
`term`	`Binary`	Binary AST node with COLON operator.	required
`data`	`DataFrame`	Polars DataFrame.	required
`factors`	`dict[str, list[str]]`	Current factor levels mapping.	required
`contrast_matrices`	`dict[str, NDArray]`	Current contrast matrices.	required
`transform_state`	`dict[str, dict]`	Current transform state.	required
`transforms`	`dict[str, object]`	Current fitted transforms.	required
`custom_contrasts`	`dict[str, NDArray]`	User-provided contrast matrices.	required
`intercept_absorbed`	`bool`	Whether intercept df has been absorbed.	required
`is_toplevel`	`bool`	Whether this is a top-level interaction term.	`True`

Returns:

Type	Description
`tuple[TermResult, bool]`	(TermResult, updated_intercept_absorbed).

evaluate_star¶

evaluate_star(term: Binary, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> tuple[TermResult, bool]

Evaluate a * term (main effects + interaction): a * b = a + b + a:b.

Parameters:

Name	Type	Description	Default
`term`	`Binary`	Binary AST node with STAR operator.	required
`data`	`DataFrame`	Polars DataFrame.	required
`factors`	`dict[str, list[str]]`	Current factor levels mapping.	required
`contrast_matrices`	`dict[str, NDArray]`	Current contrast matrices.	required
`transform_state`	`dict[str, dict]`	Current transform state.	required
`transforms`	`dict[str, object]`	Current fitted transforms.	required
`custom_contrasts`	`dict[str, NDArray]`	User-provided contrast matrices.	required
`intercept_absorbed`	`bool`	Whether intercept df has been absorbed.	required

Returns:

Type	Description
`tuple[TermResult, bool]`	(TermResult, updated_intercept_absorbed).

evaluate_term¶

evaluate_term(term: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> tuple[TermResult, bool]

Evaluate a single formula term against data.

Dispatches to evaluate_variable, evaluate_call, evaluate_interaction, etc. based on AST node type.

Parameters:

Name	Type	Description	Default
`term`	`object`	AST node representing the term.	required
`data`	`DataFrame`	Polars DataFrame with training data.	required
`factors`	`dict[str, list[str]]`	Current factor levels mapping.	required
`contrast_matrices`	`dict[str, NDArray]`	Current contrast matrices mapping.	required
`transform_state`	`dict[str, dict]`	Current transform state mapping.	required
`transforms`	`dict[str, object]`	Current fitted transform instances mapping.	required
`custom_contrasts`	`dict[str, NDArray]`	User-provided contrast matrices.	required
`intercept_absorbed`	`bool`	Whether intercept df has been absorbed.	required

Returns:

Type	Description
`tuple[TermResult, bool]`	(TermResult, updated_intercept_absorbed).

evaluate_variable¶

evaluate_variable(name: str, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> TermResult

Evaluate a simple variable reference.

Parameters:

Name	Type	Description	Default
`name`	`str`	Column name.	required
`data`	`DataFrame`	Polars DataFrame.	required
`factors`	`dict[str, list[str]]`	Current factor levels mapping.	required
`contrast_matrices`	`dict[str, NDArray]`	Current contrast matrices.	required
`custom_contrasts`	`dict[str, NDArray]`	User-provided contrast matrices.	required
`intercept_absorbed`	`bool`	Whether intercept df has been absorbed.	required

Returns:

Type	Description
`TermResult`	TermResult with evaluated data.

evaluate_contrast¶

Contrast function evaluation for formula-syntax categorical encoding.

Handles formula expressions like treatment(x, ref=B), sum(x, omit=A), helmert(x, [low, med, high]), poly(x, [lo, hi], degree=2).

Each contrast function maps to a builder from design.coding and a label generator. The function name in the formula IS the encoding scheme.

Functions:

Name	Description
`evaluate_contrast_call`	Evaluate a contrast encoding function call.

Attributes¶

Classes¶

Functions¶

evaluate_contrast_call¶

evaluate_contrast_call(call: Call, func_name: str, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool, *, spans_intercept: bool | None = None) -> tuple['TermResult', bool]

Evaluate a contrast encoding function call.

Dispatches to the appropriate contrast matrix builder based on the function name (treatment, sum, helmert, sequential, poly).

Parameters:

Name	Type	Description	Default
`call`	`Call`	Call AST node (e.g., `treatment(x, ref=B)`).	required
`func_name`	`str`	Canonical function name (already resolved from aliases).	required
`data`	`DataFrame`	Polars DataFrame with training data.	required
`factors`	`dict[str, list[str]]`	Current factor levels mapping.	required
`contrast_matrices`	`dict[str, NDArray]`	Current contrast matrices mapping.	required
`custom_contrasts`	`dict[str, NDArray]`	User-provided contrast matrices (ndarray overrides).	required
`intercept_absorbed`	`bool`	Whether intercept df has been absorbed.	required
`spans_intercept`	`bool \| None`	Whether this categorical should span the intercept. If None, determined automatically from intercept_absorbed.	`None`

Returns:

Type	Description
`tuple’TermResult’, [bool]`	(TermResult, updated_intercept_absorbed).

evaluate_newdata¶

Newdata evaluation — apply learned encoding to new observations.

Functions:

Name	Description
`evaluate_newdata`	Apply learned encoding from FormulaSpec to new data.

Attributes¶

Classes¶

Functions¶

evaluate_newdata¶

evaluate_newdata(spec: FormulaSpec, data: pl.DataFrame, *, on_unseen_level: str = 'error') -> NDArray[np.float64]

Apply learned encoding from FormulaSpec to new data.

Pure function: reads factor levels, contrast matrices, and transform state from spec. No mutation.

Parameters:

Name	Type	Description	Default
`spec`	`FormulaSpec`	FormulaSpec with learned encoding from build_design_matrices().	required
`data`	`DataFrame`	New data as Polars DataFrame.	required
`on_unseen_level`	`str`	How to handle unseen categorical levels. - “error”: Raise ValueError (default) - “warn”: Warn and encode as zeros - “ignore”: Silently encode as zeros	`‘error’`

Returns:

Type	Description
`NDArray[float64]`	X matrix for new observations, shape (n_new, n_features).
`NDArray[float64]`	Column order matches the original build_design_matrices() output.

evaluate_transforms¶

Transform evaluation — stateful, math, and polynomial transforms.

Extracted from evaluate.py to keep file sizes manageable and provide a clean home for nested transform evaluation logic.

Functions:

Name	Description
`evaluate_math_transform`	Evaluate a math transform like log(), sqrt().
`evaluate_stateful_transform`	Evaluate a stateful transform like center(), scale(), rank().
`resolve_transform_arg`	Resolve a transform argument to raw data, handling nested calls.

Classes¶

Functions¶

evaluate_math_transform¶

evaluate_math_transform(func_name: str, arg: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray]) -> TermResult

Evaluate a math transform like log(), sqrt().

Supports nested calls: log(rank(x)) evaluates rank(x) first, then applies log to the result.

Parameters:

Name	Type	Description	Default
`func_name`	`str`	Transform name.	required
`arg`	`object`	AST node for the argument (Variable or nested Call).	required
`data`	`DataFrame`	Polars DataFrame.	required
`factors`	`dict[str, list[str]]`	Current factor levels mapping.	required
`contrast_matrices`	`dict[str, NDArray]`	Current contrast matrices.	required
`transform_state`	`dict[str, dict]`	Current transform state mapping.	required
`transforms`	`dict[str, object]`	Current fitted transform instances.	required
`custom_contrasts`	`dict[str, NDArray]`	User-provided contrast matrices.	required

Returns:

Type	Description
`TermResult`	TermResult with transformed data.

evaluate_stateful_transform¶

evaluate_stateful_transform(func_name: str, arg: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray]) -> TermResult

Evaluate a stateful transform like center(), scale(), rank().

Supports nested calls: zscore(rank(x)) evaluates rank(x) first, then applies zscore to the result.

Parameters:

Name	Type	Description	Default
`func_name`	`str`	Transform name.	required
`arg`	`object`	AST node for the argument (Variable or nested Call).	required
`data`	`DataFrame`	Polars DataFrame.	required
`factors`	`dict[str, list[str]]`	Current factor levels mapping.	required
`contrast_matrices`	`dict[str, NDArray]`	Current contrast matrices.	required
`transform_state`	`dict[str, dict]`	Current transform state mapping.	required
`transforms`	`dict[str, object]`	Current fitted transform instances.	required
`custom_contrasts`	`dict[str, NDArray]`	User-provided contrast matrices.	required

Returns:

Type	Description
`TermResult`	TermResult with transformed data and state updates.

resolve_transform_arg¶

resolve_transform_arg(arg: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray]) -> tuple[NDArray[np.float64], str, dict]

Resolve a transform argument to raw data, handling nested calls.

If the argument is a simple variable, fetches it from the DataFrame. If the argument is a nested Call (e.g. rank(x) inside zscore(rank(x))), recursively evaluates the inner call first.

Parameters:

Name	Type	Description	Default
`arg`	`object`	AST node — Variable, QuotedName, or Call.	required
`data`	`DataFrame`	Polars DataFrame.	required
`factors`	`dict[str, list[str]]`	Current factor levels mapping.	required
`contrast_matrices`	`dict[str, NDArray]`	Current contrast matrices.	required
`transform_state`	`dict[str, dict]`	Current transform state mapping.	required
`transforms`	`dict[str, object]`	Current fitted transform instances.	required
`custom_contrasts`	`dict[str, NDArray]`	User-provided contrast matrices.	required

Returns:

Type	Description
`NDArray[float64]`	(raw_data, label, inner_state_updates) where raw_data is a 1-D float64
`str`	array, label is the display name (e.g. “x” or “rank(x)”), and
`dict`	inner_state_updates is any state produced by inner transforms.

helpers¶

Shared AST utilities for formula operations.

Functions:

Name	Description
`contains_pipe`	Check if an AST node contains a PIPE operator (random effect).
`extract_name`	Extract variable name or literal value from AST node.
`variable_not_found_error`	Create informative error for missing variable.

Classes¶

Functions¶

contains_pipe¶

contains_pipe(node: object) -> bool

Check if an AST node contains a PIPE operator (random effect).

Parameters:

Name	Type	Description	Default
`node`	`object`	AST node to check.	required

Returns:

Type	Description
`bool`	True if node contains a PIPE operator.

extract_name¶

extract_name(node: object) -> str | None

Extract variable name or literal value from AST node.

Handles interaction terms (a:b) by recursively extracting names from both sides and joining with ‘:’.

Parameters:

Name	Type	Description	Default
`node`	`object`	AST node to extract name from.	required

Returns:

Type	Description
`str \| None`	Variable name string, or None if not extractable.

variable_not_found_error¶

variable_not_found_error(name: str, data_columns: list[str]) -> ValueError

Create informative error for missing variable.

Parameters:

Name	Type	Description	Default
`name`	`str`	Variable name that was not found.	required
`data_columns`	`list[str]`	List of available column names in the data.	required

Returns:

Type	Description
`ValueError`	ValueError with helpful message including available columns
`ValueError`	and “did you mean?” suggestions.

parse¶

R-style formula string parsing into FormulaSpec containers.

Classes:

Name	Description
`FormulaError`	Exception raised for formula parsing errors.
`FormulaStructure`	Data-free formula structure extracted from an AST.

Functions:

Name	Description
`expand_double_verts`	Expand
`expand_nested_syntax`	Expand nested / syntax into separate crossed random effects terms.
`extract_formula_structure`	Extract formula structure from a formula string without data.
`parse_formula`	Parse formula and detect categoricals from data.

Attributes¶

Classes¶

FormulaError¶

FormulaError(message: str, formula: str | None = None, position: int | None = None) -> None

Bases: ValueError

Exception raised for formula parsing errors.

Provides helpful error messages with pointer to error position.

Parameters:

Name	Type	Description	Default
`message`	`str`	Error description.	required
`formula`	`str \| None`	The formula that caused the error.	`None`
`position`	`int \| None`	Character position of the error (optional).	`None`

Attributes:

Name	Type	Description
`formula`
`position`

Attributes¶

formula¶

formula = formula

position¶

position = position

FormulaStructure¶

Data-free formula structure extracted from an AST.

Contains the same structural information as a full parse_formula() call but without requiring data for categorical detection. Used by build_model_spec_from_formula() to replace regex-based extraction.

Attributes:

Name	Type	Description
`response_var`	`str \| None`	Response variable name, or None for RHS-only formulas.
`response_transform`	`tuple[str, ...] \| None`	Tuple of LHS transforms (innermost-first), or None if no transforms.
`fixed_term_names`	`tuple[str, ...]`	Human-readable fixed-effect term names (e.g. `["Intercept", "x", "group"]`).
`has_intercept`	`bool`	Whether the formula includes an intercept.
`has_random_effects`	`bool`	Whether the formula contains ``
`random_terms_raw`	`tuple[str, ...]`	Raw string representations of RE terms (e.g. ``["(1

Attributes¶

fixed_term_names¶

fixed_term_names: tuple[str, ...]

has_intercept¶

has_intercept: bool

has_random_effects¶

has_random_effects: bool

random_terms_raw¶

random_terms_raw: tuple[str, ...]

response_transform¶

response_transform: tuple[str, ...] | None

response_var¶

response_var: str | None

Functions¶

expand_double_verts¶

expand_double_verts(formula: str) -> tuple[str, dict]

Expand || syntax into separate uncorrelated random effects terms.

This matches lme4’s expandDoubleVerts() function. The || syntax creates independent (uncorrelated) random effects by expanding to separate terms.

Parameters:

Name	Type	Description	Default
`formula`	`str`	R-style formula string potentially containing

Returns:

Type	Description
`tuple[str, dict]`	A tuple of: - Expanded formula string with

Examples:

>>> expand_double_verts("y ~ x + (Days || Subject)")
('y ~ x + (1 | Subject) + (0 + Days | Subject)', {...})

>>> expand_double_verts("y ~ x + (1 + x + y || group)")
('y ~ x + (1 | group) + (0 + x | group) + (0 + y | group)', {...})

Note: The transformation rules are:

(x || g) -> (1 | g) + (0 + x | g)
(1 + x || g) -> (1 | g) + (0 + x | g)
(1 + x + y || g) -> (1 | g) + (0 + x | g) + (0 + y | g)
(0 + x || g) -> (0 + x | g) [no intercept term added]

expand_nested_syntax¶

expand_nested_syntax(formula: str) -> tuple[str, dict]

Expand nested / syntax into separate crossed random effects terms.

This matches lme4’s behavior where nested syntax (a/b) is syntactic sugar for separate terms: (1|a/b) expands to (1|a) + (1|a:b).

The : in the grouping factor creates an interaction grouping where each unique combination of levels becomes a separate group.

Parameters:

Name	Type	Description	Default
`formula`	`str`	R-style formula string potentially containing / in RE terms.	required

Returns:

Type	Description
`tuple[str, dict]`	A tuple of: - Expanded formula string with / replaced by separate terms - Metadata dict tracking which terms came from / expansion

Examples:

>>> expand_nested_syntax("y ~ x + (1|school/class)")
('y ~ x + (1|school) + (1|school:class)', {...})

>>> expand_nested_syntax("y ~ x + (1|a/b/c)")
('y ~ x + (1|a) + (1|a:b) + (1|a:b:c)', {...})

>>> expand_nested_syntax("y ~ x + (Days|Subject/Session)")
('y ~ x + (Days|Subject) + (Days|Subject:Session)', {...})

Note: The transformation rules are:

(1|a/b) -> (1|a) + (1|a:b)
(1|a/b/c) -> (1|a) + (1|a:b) + (1|a:b:c)
(x|a/b) -> (x|a) + (x|a:b)
(1 + x|a/b) -> (1 + x|a) + (1 + x|a:b)

extract_formula_structure¶

extract_formula_structure(formula: str) -> FormulaStructure

Extract formula structure from a formula string without data.

Parses the formula into an AST and walks it to extract response variable, fixed-effect term names, intercept presence, and random-effect term strings. Does not require data (no categorical detection).

Parameters:

Name	Type	Description	Default
`formula`	`str`	R-style formula string (e.g. ``"y ~ x + (1	group)"``).

Returns:

Type	Description
`FormulaStructure`	FormulaStructure with extracted information.

parse_formula¶

parse_formula(formula: str, data: pl.DataFrame, *, factors: dict[str, list[str]] | None = None, custom_contrasts: dict[str, NDArray] | None = None) -> FormulaSpec

Parse formula and detect categoricals from data.

This does parsing + categorical detection but NOT matrix construction. Reuses parser/ for tokenization and AST construction.

Parameters:

Name	Type	Description	Default
`formula`	`str`	R-style formula string (e.g., “y ~ x + z”).	required
`data`	`DataFrame`	Polars DataFrame to detect categoricals from.	required
`factors`	`dict[str, list[str]] \| None`	Optional dict mapping column names to level orderings. If provided, these orderings are used for categorical encoding.	`None`
`custom_contrasts`	`dict[str, NDArray] \| None`	Optional dict of user-provided contrast matrices.	`None`

Returns:

Type	Description
`FormulaSpec`	FormulaSpec with parsed terms and detected factor levels.

parser¶

Recursive descent parser for statistical formula strings.

Modules:

Name	Description
`expr`	AST expression node types for formula parsing.
`parser`	Recursive descent parser for formula strings.
`scanner`	Formula string scanner/tokenizer.
`token`	Token class for formula parsing.

Classes:

Name	Description
`Assign`	Expression for assignments (e.g., x=value in function calls).
`Binary`	Expression for binary operations (e.g., x + y, x ~ y).
`Call`	Expression for function calls (e.g., factor(x), center(y)).
`Grouping`	Expression for parenthesized groups.
`ListExpr`	Expression for bracket list literals (e.g., [low, med, high]).
`Literal`	Expression for literal values (numbers, strings, etc.).
`ParseError`	Error raised during formula parsing.
`Parser`	Parse a sequence of Tokens and return an abstract syntax tree.
`QuotedName`	Expression for back-quoted names (e.g., `weird column name!`).
`ScanError`	Error raised during formula scanning.
`Scanner`	Scan formula string and return Tokens.
`Token`	Representation of a single Token.
`Unary`	Expression for unary operations (e.g., -x, +x).
`Variable`	Expression for variable references.

Classes¶

Assign¶

Assign(name: 'Variable', value: object) -> None

Expression for assignments (e.g., x=value in function calls).

Attributes:

Name	Type	Description
`name`
`value`

Attributes¶

name¶

name = name

value¶

value = value

Binary¶

Binary(left: object, operator: Token, right: object) -> None

Expression for binary operations (e.g., x + y, x ~ y).

Attributes:

Name	Type	Description
`left`
`operator`
`right`

Attributes¶

left¶

left = left

operator¶

operator = operator

right¶

right = right

Call¶

Call(callee: object, args: list) -> None

Expression for function calls (e.g., factor(x), center(y)).

Attributes:

Name	Type	Description
`args`
`callee`

Attributes¶

args¶

args = args

callee¶

callee = callee

Grouping¶

Grouping(expression: object) -> None

Expression for parenthesized groups.

Attributes:

Name	Type	Description
`expression`

Attributes¶

expression¶

expression = expression

ListExpr¶

ListExpr(elements: list[object]) -> None

Expression for bracket list literals (e.g., [low, med, high]).

Used for level ordering in contrast functions like helmert(x, [low, med, high]).

Attributes:

Name	Type	Description
`elements`

Attributes¶

elements¶

elements = elements

Literal¶

Literal(value: object, lexeme: str | None = None) -> None

Expression for literal values (numbers, strings, etc.).

Attributes:

Name	Type	Description
`lexeme`
`value`

Attributes¶

lexeme¶

lexeme = lexeme

value¶

value = value

ParseError¶

Bases: Exception

Error raised during formula parsing.

Parser¶

Parser(tokens: list[Token], formula: str = '') -> None

Parse a sequence of Tokens and return an abstract syntax tree.

Parameters:

Name	Type	Description	Default
`tokens`	`list[Token]`	A list of Token objects as returned by Scanner.scan().	required
`formula`	`str`	The original formula string (for error messages).	`‘’`

Functions:

Name	Description
`addition`
`advance`
`assignment`
`at_end`
`call`
`check`	Check if current token matches any of the given types.
`comparison`
`consume`	Consume the next Token, raising ParseError if it doesn’t match.
`expression`
`finish_call`
`format_error_context`	Format a parse error with visual pointer to the error location.
`interaction`
`match`	Match and consume token if it matches any of the given types.
`multiple_interaction`
`multiplication`
`parse`	Parse a sequence of Tokens.
`peek`	Return the Token we are about to consume.
`previous`	Return the last Token we consumed.
`primary`
`random_effect`
`tilde`
`unary`

Attributes:

Name	Type	Description
`current`
`formula`
`tokens`

Attributes¶

current¶

current = 0

formula¶

formula = formula

tokens¶

tokens = tokens

Functions¶

addition¶

addition() -> object

advance¶

advance() -> Token | None

assignment¶

assignment() -> object

at_end¶

at_end() -> bool

call¶

call() -> object

check¶

check(types: str | list[str]) -> bool

Check if current token matches any of the given types.

comparison¶

comparison() -> object

consume¶

consume(kind: str, message: str) -> Token

Consume the next Token, raising ParseError if it doesn’t match.

Parameters:

Name	Type	Description	Default
`kind`	`str`	Expected token kind.	required
`message`	`str`	Error message if token doesn’t match.	required

Returns:

Type	Description
`Token`	The consumed token.

expression¶

expression() -> object

finish_call¶

finish_call(expr: object) -> Call

format_error_context¶

format_error_context(position: int, message: str) -> str

Format a parse error with visual pointer to the error location.

Parameters:

Name	Type	Description	Default
`position`	`int`	Character offset where error occurred.	required
`message`	`str`	The error description.	required

Returns:

Type	Description
`str`	Formatted error message with context and pointer.

interaction¶

interaction() -> object

match¶

match(types: str | list[str]) -> bool

Match and consume token if it matches any of the given types.

multiple_interaction¶

multiple_interaction() -> object

multiplication¶

multiplication() -> object

parse¶

parse() -> object

Parse a sequence of Tokens.

Returns:

Type	Description
`object`	An AST expression node representing the parsed formula.

peek¶

peek() -> Token

Return the Token we are about to consume.

previous¶

previous() -> Token

Return the last Token we consumed.

primary¶

primary() -> object

random_effect¶

random_effect() -> object

tilde¶

tilde() -> object

unary¶

unary() -> object

QuotedName¶

QuotedName(expression: Token) -> None

Expression for back-quoted names (e.g., weird column name!).

Attributes:

Name	Type	Description
`expression`

Attributes¶

expression¶

expression = expression

ScanError¶

Bases: Exception

Error raised during formula scanning.

Scanner¶

Scanner(code: str) -> None

Scan formula string and return Tokens.

Parameters:

Name	Type	Description	Default
`code`	`str`	The formula string to scan.	required

Functions:

Name	Description
`add_token`
`advance`
`at_end`
`backquote`
`char`
`floatnum`
`identifier`
`match`
`number`
`peek`
`peek_next`
`scan`	Scan formula string.
`scan_token`

Attributes:

Name	Type	Description
`code`
`current`
`start`
`tokens`	`list[Token]`

Attributes¶

code¶

code = code

current¶

current = 0

start¶

start = 0

tokens¶

tokens: list[Token] = []

Functions¶

add_token¶

add_token(kind: str, literal: object = None) -> None

advance¶

advance() -> str

at_end¶

at_end() -> bool

backquote¶

backquote() -> None

char¶

char() -> None

floatnum¶

floatnum() -> None

identifier¶

identifier() -> None

match¶

match(expected: str) -> bool

number¶

number() -> None

peek¶

peek() -> str

peek_next¶

peek_next() -> str

scan¶

scan(add_intercept: bool = True) -> list[Token]

Scan formula string.

Parameters:

Name	Type	Description	Default
`add_intercept`	`bool`	Whether to add an implicit intercept. Defaults to True.	`True`

Returns:

Type	Description
`list[Token]`	A list of Token objects.

scan_token¶

scan_token() -> None

Token¶

Token(kind: str, lexeme: str, literal: object = None, position: int = 0) -> None

Representation of a single Token.

Attributes:

Name	Type	Description
`kind`		Token type (e.g., “IDENTIFIER”, “PLUS”, “TILDE”).
`lexeme`		The actual string from the source.
`literal`		Parsed literal value (for numbers, strings).
`position`		Character offset in the original formula string.

Attributes¶

kind¶

kind = kind

lexeme¶

lexeme = lexeme

literal¶

literal = literal

position¶

position = position

Unary¶

Unary(operator: Token, right: object) -> None

Expression for unary operations (e.g., -x, +x).

Attributes:

Name	Type	Description
`operator`
`right`

Attributes¶

operator¶

operator = operator

right¶

right = right

Variable¶

Variable(name: Token, level: 'Literal | None' = None) -> None

Expression for variable references.

Attributes:

Name	Type	Description
`level`
`name`

Attributes¶

level¶

level = level

name¶

name = name

Modules¶

expr¶

AST expression node types for formula parsing.

Vendored from formulae library (https://github.com/bambinos/formulae).

Classes:

Name	Description
`Assign`	Expression for assignments (e.g., x=value in function calls).
`Binary`	Expression for binary operations (e.g., x + y, x ~ y).
`Call`	Expression for function calls (e.g., factor(x), center(y)).
`Grouping`	Expression for parenthesized groups.
`ListExpr`	Expression for bracket list literals (e.g., [low, med, high]).
`Literal`	Expression for literal values (numbers, strings, etc.).
`QuotedName`	Expression for back-quoted names (e.g., `weird column name!`).
`Unary`	Expression for unary operations (e.g., -x, +x).
`Variable`	Expression for variable references.

Classes¶

Assign¶

Assign(name: 'Variable', value: object) -> None

Expression for assignments (e.g., x=value in function calls).

Attributes:

Name	Type	Description
`name`
`value`

Attributes¶

name¶

name = name

value¶

value = value

Binary¶

Binary(left: object, operator: Token, right: object) -> None

Expression for binary operations (e.g., x + y, x ~ y).

Attributes:

Name	Type	Description
`left`
`operator`
`right`

Attributes¶

left¶

left = left

operator¶

operator = operator

right¶

right = right

Call¶

Call(callee: object, args: list) -> None

Expression for function calls (e.g., factor(x), center(y)).

Attributes:

Name	Type	Description
`args`
`callee`

Attributes¶

args¶

args = args

callee¶

callee = callee

Grouping¶

Grouping(expression: object) -> None

Expression for parenthesized groups.

Attributes:

Name	Type	Description
`expression`

Attributes¶

expression¶

expression = expression

ListExpr¶

ListExpr(elements: list[object]) -> None

Expression for bracket list literals (e.g., [low, med, high]).

Used for level ordering in contrast functions like helmert(x, [low, med, high]).

Attributes:

Name	Type	Description
`elements`

Attributes¶

elements¶

elements = elements

Literal¶

Literal(value: object, lexeme: str | None = None) -> None

Expression for literal values (numbers, strings, etc.).

Attributes:

Name	Type	Description
`lexeme`
`value`

Attributes¶

lexeme¶

lexeme = lexeme

value¶

value = value

QuotedName¶

QuotedName(expression: Token) -> None

Expression for back-quoted names (e.g., weird column name!).

Attributes:

Name	Type	Description
`expression`

Attributes¶

expression¶

expression = expression

Unary¶

Unary(operator: Token, right: object) -> None

Expression for unary operations (e.g., -x, +x).

Attributes:

Name	Type	Description
`operator`
`right`

Attributes¶

operator¶

operator = operator

right¶

right = right

Variable¶

Variable(name: Token, level: 'Literal | None' = None) -> None

Expression for variable references.

Attributes:

Name	Type	Description
`level`
`name`

Attributes¶

level¶

level = level

name¶

name = name

parser¶

Recursive descent parser for formula strings.

Vendored from formulae library (https://github.com/bambinos/formulae).

Classes:

Name	Description
`ParseError`	Error raised during formula parsing.
`Parser`	Parse a sequence of Tokens and return an abstract syntax tree.

Functions:

Name	Description
`listify`	Wrap non-list objects in a list.

Classes¶

ParseError¶

Bases: Exception

Error raised during formula parsing.

Parser¶

Parser(tokens: list[Token], formula: str = '') -> None

Parse a sequence of Tokens and return an abstract syntax tree.

Parameters:

Name	Type	Description	Default
`tokens`	`list[Token]`	A list of Token objects as returned by Scanner.scan().	required
`formula`	`str`	The original formula string (for error messages).	`‘’`

Functions:

Name	Description
`addition`
`advance`
`assignment`
`at_end`
`call`
`check`	Check if current token matches any of the given types.
`comparison`
`consume`	Consume the next Token, raising ParseError if it doesn’t match.
`expression`
`finish_call`
`format_error_context`	Format a parse error with visual pointer to the error location.
`interaction`
`match`	Match and consume token if it matches any of the given types.
`multiple_interaction`
`multiplication`
`parse`	Parse a sequence of Tokens.
`peek`	Return the Token we are about to consume.
`previous`	Return the last Token we consumed.
`primary`
`random_effect`
`tilde`
`unary`

Attributes:

Name	Type	Description
`current`
`formula`
`tokens`

Attributes¶

current¶

current = 0

formula¶

formula = formula

tokens¶

tokens = tokens

Functions¶

addition¶

addition() -> object

advance¶

advance() -> Token | None

assignment¶

assignment() -> object

at_end¶

at_end() -> bool

call¶

call() -> object

check¶

check(types: str | list[str]) -> bool

Check if current token matches any of the given types.

comparison¶

comparison() -> object

consume¶

consume(kind: str, message: str) -> Token

Consume the next Token, raising ParseError if it doesn’t match.

Parameters:

Name	Type	Description	Default
`kind`	`str`	Expected token kind.	required
`message`	`str`	Error message if token doesn’t match.	required

Returns:

Type	Description
`Token`	The consumed token.

expression¶

expression() -> object

finish_call¶

finish_call(expr: object) -> Call

format_error_context¶

format_error_context(position: int, message: str) -> str

Format a parse error with visual pointer to the error location.

Parameters:

Name	Type	Description	Default
`position`	`int`	Character offset where error occurred.	required
`message`	`str`	The error description.	required

Returns:

Type	Description
`str`	Formatted error message with context and pointer.

interaction¶

interaction() -> object

match¶

match(types: str | list[str]) -> bool

Match and consume token if it matches any of the given types.

multiple_interaction¶

multiple_interaction() -> object

multiplication¶

multiplication() -> object

parse¶

parse() -> object

Parse a sequence of Tokens.

Returns:

Type	Description
`object`	An AST expression node representing the parsed formula.

peek¶

peek() -> Token

Return the Token we are about to consume.

previous¶

previous() -> Token

Return the last Token we consumed.

primary¶

primary() -> object

random_effect¶

random_effect() -> object

tilde¶

tilde() -> object

unary¶

unary() -> object

Functions¶

listify¶

listify(obj: str | list[str] | None) -> list[str]

Wrap non-list objects in a list.

scanner¶

Formula string scanner/tokenizer.

Vendored from formulae library (https://github.com/bambinos/formulae).

Classes:

Name	Description
`ScanError`	Error raised during formula scanning.
`Scanner`	Scan formula string and return Tokens.

Functions:

Name	Description
`format_error_context`	Format a scan error with visual pointer to the error location.

Classes¶

ScanError¶

Bases: Exception

Error raised during formula scanning.

Scanner¶

Scanner(code: str) -> None

Scan formula string and return Tokens.

Parameters:

Name	Type	Description	Default
`code`	`str`	The formula string to scan.	required

Functions:

Name	Description
`add_token`
`advance`
`at_end`
`backquote`
`char`
`floatnum`
`identifier`
`match`
`number`
`peek`
`peek_next`
`scan`	Scan formula string.
`scan_token`

Attributes:

Name	Type	Description
`code`
`current`
`start`
`tokens`	`list[Token]`

Attributes¶

code¶

code = code

current¶

current = 0

start¶

start = 0

tokens¶

tokens: list[Token] = []

Functions¶

add_token¶

add_token(kind: str, literal: object = None) -> None

advance¶

advance() -> str

at_end¶

at_end() -> bool

backquote¶

backquote() -> None

char¶

char() -> None

floatnum¶

floatnum() -> None

identifier¶

identifier() -> None

match¶

match(expected: str) -> bool

number¶

number() -> None

peek¶

peek() -> str

peek_next¶

peek_next() -> str

scan¶

scan(add_intercept: bool = True) -> list[Token]

Scan formula string.

Parameters:

Name	Type	Description	Default
`add_intercept`	`bool`	Whether to add an implicit intercept. Defaults to True.	`True`

Returns:

Type	Description
`list[Token]`	A list of Token objects.

scan_token¶

scan_token() -> None

Functions¶

format_error_context¶

format_error_context(formula: str, position: int, message: str) -> str

Format a scan error with visual pointer to the error location.

Parameters:

Name	Type	Description	Default
`formula`	`str`	The original formula string.	required
`position`	`int`	Character offset where error occurred.	required
`message`	`str`	The error description.	required

Returns:

Type	Description
`str`	Formatted error message with context and pointer.

token¶

Token class for formula parsing.

Vendored from formulae library (https://github.com/bambinos/formulae).

Classes:

Name	Description
`Token`	Representation of a single Token.

Classes¶

Token¶

Token(kind: str, lexeme: str, literal: object = None, position: int = 0) -> None

Representation of a single Token.

Attributes:

Name	Type	Description
`kind`		Token type (e.g., “IDENTIFIER”, “PLUS”, “TILDE”).
`lexeme`		The actual string from the source.
`literal`		Parsed literal value (for numbers, strings).
`position`		Character offset in the original formula string.

Attributes¶

kind¶

kind = kind

lexeme¶

lexeme = lexeme

literal¶

literal = literal

position¶

position = position

random_effects¶

Random effects Z matrix construction from FormulaSpec.

Functions:

Name	Description
`build_random_effects_from_spec`	Build random effects design matrix from FormulaSpec.

Classes¶

Functions¶

build_random_effects_from_spec¶

build_random_effects_from_spec(spec: FormulaSpec, data: pl.DataFrame) -> RandomEffectsInfo | None

Build random effects design matrix from FormulaSpec.

Parameters:

Name	Type	Description	Default
`spec`	`FormulaSpec`	Parsed formula specification with re_terms.	required
`data`	`DataFrame`	Training data (Polars DataFrame).	required

Returns:

Type	Description
`RandomEffectsInfo \| None`	RandomEffectsInfo with Z matrix and metadata, or None if no RE terms.