Formula parsing, design matrix construction, and newdata evaluation.
Call chain:
model(formula, data) -> parse_formula() -> build_bundle_from_data() -> build_design_matrices()
model.predict(newdata=) -> evaluate_newdata() (applies learned encoding to new observations)Classes:
| Name | Description |
|---|---|
DesignResult | Output of build_design_matrices(). Separates arrays from metadata. |
FormulaError | Exception raised for formula parsing errors. |
TermResult | Result of evaluating one formula term. |
Functions:
| Name | Description |
|---|---|
build_design_matrices | Build X and y matrices from a parsed formula spec. |
build_random_effects_from_spec | Build random effects design matrix from FormulaSpec. |
expand_double_verts | Expand |
expand_nested_syntax | Expand nested / syntax into separate crossed random effects terms. |
parse_formula | Parse formula and detect categoricals from data. |
Modules:
| Name | Description |
|---|---|
bundle | Data bundle construction from formula and DataFrame. |
contrast_registry | Contrast function registry for explore formulas. |
contrast_specs | Contrast specification resolution for design matrix coding. |
design | Design matrix construction from FormulaSpec. |
encoding | Categorical variable encoding using Polars Enum. |
evaluate | Term evaluation — AST nodes to design matrix columns. |
evaluate_contrast | Contrast function evaluation for formula-syntax categorical encoding. |
evaluate_newdata | Newdata evaluation — apply learned encoding to new observations. |
evaluate_transforms | Transform evaluation — stateful, math, and polynomial transforms. |
helpers | Shared AST utilities for formula operations. |
parse | R-style formula string parsing into FormulaSpec containers. |
parser | Recursive descent parser for statistical formula strings. |
random_effects | Random effects Z matrix construction from FormulaSpec. |
Classes¶
DesignResult¶
Output of build_design_matrices(). Separates arrays from metadata.
Attributes:
| Name | Type | Description |
|---|---|---|
X | NDArray[float64] | Fixed effects design matrix of shape (n_obs, n_features). |
X_labels | tuple[str, ...] | Column names for X matrix. |
y | NDArray[float64] | None | Response vector of shape (n_obs,), or None. |
y_label | str | None | Name of response variable, or None. |
Attributes¶
X¶
X: NDArray[np.float64]X_labels¶
X_labels: tuple[str, ...]n_obs¶
n_obs: intNumber of observations.
y¶
y: NDArray[np.float64] | None = Noney_label¶
y_label: str | None = NoneFormulaError¶
FormulaError(message: str, formula: str | None = None, position: int | None = None) -> NoneBases: ValueError
Exception raised for formula parsing errors.
Provides helpful error messages with pointer to error position.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
message | str | Error description. | required |
formula | str | None | The formula that caused the error. | None |
position | int | None | Character position of the error (optional). | None |
Attributes:
| Name | Type | Description |
|---|---|---|
formula | ||
position |
Attributes¶
formula¶
formula = formulaposition¶
position = positionTermResult¶
Result of evaluating one formula term.
Attributes:
| Name | Type | Description |
|---|---|---|
columns | NDArray[float64] | Data array, shape (n_obs,) or (n_obs, k). |
labels | list[str] | Column names for the result. |
state_updates | dict | Partial state to merge back into accumulators. May contain keys: “factors”, “contrast_matrices”, “contrast_types”, “transform_state”, “transforms”. |
Attributes¶
columns¶
columns: NDArray[np.float64]labels¶
labels: list[str]state_updates¶
state_updates: dictFunctions¶
build_design_matrices¶
build_design_matrices(spec: FormulaSpec, data: pl.DataFrame) -> tuple[DesignResult, FormulaSpec]Build X and y matrices from a parsed formula spec.
Evaluates terms, learns encoding (contrasts, transforms), and returns both the design matrices AND an updated FormulaSpec with learned state.
The returned FormulaSpec has contrast_matrices, contrast_types, and transform_state populated (they may be empty in the input spec from parse_formula if this is the first build).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | FormulaSpec | FormulaSpec from parse_formula(). | required |
data | DataFrame | Polars DataFrame with training data. | required |
Returns:
| Type | Description |
|---|---|
tuple[DesignResult, FormulaSpec] | (DesignResult, FormulaSpec) — matrices + updated spec with learned encoding. |
build_random_effects_from_spec¶
build_random_effects_from_spec(spec: FormulaSpec, data: pl.DataFrame) -> RandomEffectsInfo | NoneBuild random effects design matrix from FormulaSpec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | FormulaSpec | Parsed formula specification with re_terms. | required |
data | DataFrame | Training data (Polars DataFrame). | required |
Returns:
| Type | Description |
|---|---|
RandomEffectsInfo | None | RandomEffectsInfo with Z matrix and metadata, or None if no RE terms. |
expand_double_verts¶
expand_double_verts(formula: str) -> tuple[str, dict]Expand || syntax into separate uncorrelated random effects terms.
This matches lme4’s expandDoubleVerts() function. The || syntax creates independent (uncorrelated) random effects by expanding to separate terms.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | R-style formula string potentially containing |
Returns:
| Type | Description |
|---|---|
tuple[str, dict] | A tuple of: - Expanded formula string with |
Examples:
>>> expand_double_verts("y ~ x + (Days || Subject)")
('y ~ x + (1 | Subject) + (0 + Days | Subject)', {...})>>> expand_double_verts("y ~ x + (1 + x + y || group)")
('y ~ x + (1 | group) + (0 + x | group) + (0 + y | group)', {...})Note: The transformation rules are:
(x || g) -> (1 | g) + (0 + x | g)
(1 + x || g) -> (1 | g) + (0 + x | g)
(1 + x + y || g) -> (1 | g) + (0 + x | g) + (0 + y | g)
(0 + x || g) -> (0 + x | g) [no intercept term added]
expand_nested_syntax¶
expand_nested_syntax(formula: str) -> tuple[str, dict]Expand nested / syntax into separate crossed random effects terms.
This matches lme4’s behavior where nested syntax (a/b) is syntactic sugar for separate terms: (1|a/b) expands to (1|a) + (1|a:b).
The : in the grouping factor creates an interaction grouping where each unique combination of levels becomes a separate group.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | R-style formula string potentially containing / in RE terms. | required |
Returns:
| Type | Description |
|---|---|
tuple[str, dict] | A tuple of: - Expanded formula string with / replaced by separate terms - Metadata dict tracking which terms came from / expansion |
Examples:
>>> expand_nested_syntax("y ~ x + (1|school/class)")
('y ~ x + (1|school) + (1|school:class)', {...})>>> expand_nested_syntax("y ~ x + (1|a/b/c)")
('y ~ x + (1|a) + (1|a:b) + (1|a:b:c)', {...})>>> expand_nested_syntax("y ~ x + (Days|Subject/Session)")
('y ~ x + (Days|Subject) + (Days|Subject:Session)', {...})Note: The transformation rules are:
(1|a/b) -> (1|a) + (1|a:b)
(1|a/b/c) -> (1|a) + (1|a:b) + (1|a:b:c)
(x|a/b) -> (x|a) + (x|a:b)
(1 + x|a/b) -> (1 + x|a) + (1 + x|a:b)
parse_formula¶
parse_formula(formula: str, data: pl.DataFrame, *, factors: dict[str, list[str]] | None = None, custom_contrasts: dict[str, NDArray] | None = None) -> FormulaSpecParse formula and detect categoricals from data.
This does parsing + categorical detection but NOT matrix construction. Reuses parser/ for tokenization and AST construction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | R-style formula string (e.g., “y ~ x + z”). | required |
data | DataFrame | Polars DataFrame to detect categoricals from. | required |
factors | dict[str, list[str]] | None | Optional dict mapping column names to level orderings. If provided, these orderings are used for categorical encoding. | None |
custom_contrasts | dict[str, NDArray] | None | Optional dict of user-provided contrast matrices. | None |
Returns:
| Type | Description |
|---|---|
FormulaSpec | FormulaSpec with parsed terms and detected factor levels. |
Modules¶
bundle¶
Data bundle construction from formula and DataFrame.
Orchestrates formula parsing, design matrix construction, missing value handling, weight validation, rank deficiency detection, and random effects metadata to produce a DataBundle. Extracted from model/core.py.
Functions:
| Name | Description |
|---|---|
build_bundle_from_data | Build a DataBundle and learned FormulaSpec from a model spec and data. |
filter_valid_rows | Filter a DataFrame to only valid (non-NA) rows using a boolean mask. |
Classes¶
Functions¶
build_bundle_from_data¶
build_bundle_from_data(*, spec: ModelSpec, formula: str, data: pl.DataFrame, custom_contrasts: dict[str, np.ndarray] | None, weights_col: str | None, offset_col: str | None = None, missing: str) -> tuple[DataBundle, FormulaSpec]Build a DataBundle and learned FormulaSpec from a model spec and data.
Handles the full pipeline: formula parsing, design matrix construction, missing value handling, weight validation, offset extraction, rank deficiency detection, family-specific response validation, and random effects metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | ModelSpec | Model specification with parsed formula info. | required |
formula | str | Raw formula string (e.g. ``"y ~ x + (1 | group)"``). |
data | DataFrame | Input data as a Polars DataFrame. | required |
custom_contrasts | dict[str, ndarray] | None | User-specified contrast matrices, or None. | required |
weights_col | str | None | Name of the weights column in data, or None. | required |
offset_col | str | None | Name of the offset column in data, or None. | None |
missing | str | How to handle missing values ("drop" or "fail"). | required |
Returns:
| Type | Description |
|---|---|
DataBundle | Tuple of (DataBundle, FormulaSpec). The FormulaSpec is needed |
FormulaSpec | for consistent newdata evaluation via evaluate_newdata(). |
filter_valid_rows¶
filter_valid_rows(data: pl.DataFrame | None, valid_mask: np.ndarray | None) -> pl.DataFrame | NoneFilter a DataFrame to only valid (non-NA) rows using a boolean mask.
Returns the data unchanged if no filtering is needed (data is None, mask is None, or all rows are valid).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | DataFrame | None | Polars DataFrame to filter, or None. | required |
valid_mask | ndarray | None | Boolean array indicating valid rows, or None. | required |
Returns:
| Type | Description |
|---|---|
DataFrame | None | Filtered DataFrame, or None if data was None. |
contrast_registry¶
Contrast function registry for explore formulas.
Centralizes the vocabulary of contrast function names, aliases, and parameter requirements. Used by the explore parser and contrast dispatch logic to ensure consistent naming.
Functions:
| Name | Description |
|---|---|
resolve_contrast_name | Resolve a contrast function name to its canonical form. |
Attributes:
| Name | Type | Description |
|---|---|---|
CONTRAST_ALIASES | dict[str, str] | |
DEGREE_FUNCTIONS | frozenset[str] | |
MODEL_CONTRAST_FUNCTIONS | frozenset[str] | |
OMIT_FUNCTIONS | frozenset[str] | |
ORDER_DEPENDENT | frozenset[str] | |
REF_FUNCTIONS | frozenset[str] | |
VALID_CONTRAST_FUNCTIONS | frozenset[str] |
Attributes¶
CONTRAST_ALIASES¶
CONTRAST_ALIASES: dict[str, str] = {'pairwise': 'pairwise', 'sequential': 'sequential', 'poly': 'poly', 'treatment': 'treatment', 'dummy': 'treatment', 'sum': 'sum', 'deviation': 'sum', 'helmert': 'helmert'}DEGREE_FUNCTIONS¶
DEGREE_FUNCTIONS: frozenset[str] = frozenset({'poly'})MODEL_CONTRAST_FUNCTIONS¶
MODEL_CONTRAST_FUNCTIONS: frozenset[str] = frozenset({k for k, v in (CONTRAST_ALIASES.items()) if v != 'pairwise'})OMIT_FUNCTIONS¶
OMIT_FUNCTIONS: frozenset[str] = frozenset({'sum', 'deviation'})ORDER_DEPENDENT¶
ORDER_DEPENDENT: frozenset[str] = frozenset({'sequential', 'poly', 'helmert'})REF_FUNCTIONS¶
REF_FUNCTIONS: frozenset[str] = frozenset({'treatment', 'dummy'})VALID_CONTRAST_FUNCTIONS¶
VALID_CONTRAST_FUNCTIONS: frozenset[str] = frozenset(CONTRAST_ALIASES.keys())Functions¶
resolve_contrast_name¶
resolve_contrast_name(name: str) -> strResolve a contrast function name to its canonical form.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name | str | Contrast function name (may be an alias). | required |
Returns:
| Type | Description |
|---|---|
str | Canonical contrast name. |
contrast_specs¶
Contrast specification resolution for design matrix coding.
Resolves user-facing contrast specs (strings, tuples, ndarrays) into concrete contrast matrices. This is pure validation + dispatch logic extracted from the model class.
resolve_contrast_specs: Validate and resolve contrast specifications resolve_contrast_specs: Validate and resolve contrast specifications
Functions:
| Name | Description |
|---|---|
resolve_contrast_specs | Resolve user-facing contrast specs into concrete contrast matrices. |
validate_constructor_contrasts | Validate the contrasts= kwarg passed to the model constructor. |
Attributes¶
Functions¶
resolve_contrast_specs¶
resolve_contrast_specs(data: pl.DataFrame, contrasts: dict[str, object]) -> dict[str, NDArray[np.float64]]Resolve user-facing contrast specs into concrete contrast matrices.
Takes the raw contrast specifications provided by the user (strings, tuples, ndarrays) and validates them against the data, returning a dict mapping column names to contrast matrices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | DataFrame | Polars DataFrame containing the model data. Used to validate column existence and extract factor levels. | required |
contrasts | dict[str, object] | Mapping of column names to contrast specifications. Each value can be: - A string: 'treatment', 'sum', 'helmert', 'poly', or 'sequential' - A tuple: ('treatment', 'B') for treatment coding with ‘B’ as reference, or ('sum', 'A') for sum coding omitting ‘A’ - An ndarray: Custom contrast matrix of shape (n_levels, n_levels - 1) | required |
Returns:
| Type | Description |
|---|---|
dict[str, NDArray[float64]] | Dict mapping column names to contrast matrices (each of shape |
dict[str, NDArray[float64]] | (n_levels, n_levels - 1)). |
validate_constructor_contrasts¶
validate_constructor_contrasts(contrasts: dict, data: pl.DataFrame | None) -> NoneValidate the contrasts= kwarg passed to the model constructor.
Each value must be an ndarray of shape (n_levels, n_levels - 1)
where n_levels is the number of unique values in the column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
contrasts | dict | Dict mapping column names to ndarray contrast matrices. | required |
data | DataFrame | None | The model’s data (may be None for simulation-first). | required |
design¶
Design matrix construction from FormulaSpec.
Classes:
| Name | Description |
|---|---|
DesignResult | Output of build_design_matrices(). Separates arrays from metadata. |
Functions:
| Name | Description |
|---|---|
build_design_matrices | Build X and y matrices from a parsed formula spec. |
Attributes¶
Classes¶
DesignResult¶
Output of build_design_matrices(). Separates arrays from metadata.
Attributes:
| Name | Type | Description |
|---|---|---|
X | NDArray[float64] | Fixed effects design matrix of shape (n_obs, n_features). |
X_labels | tuple[str, ...] | Column names for X matrix. |
y | NDArray[float64] | None | Response vector of shape (n_obs,), or None. |
y_label | str | None | Name of response variable, or None. |
Attributes¶
X¶
X: NDArray[np.float64]X_labels¶
X_labels: tuple[str, ...]n_obs¶
n_obs: intNumber of observations.
y¶
y: NDArray[np.float64] | None = Noney_label¶
y_label: str | None = NoneFunctions¶
build_design_matrices¶
build_design_matrices(spec: FormulaSpec, data: pl.DataFrame) -> tuple[DesignResult, FormulaSpec]Build X and y matrices from a parsed formula spec.
Evaluates terms, learns encoding (contrasts, transforms), and returns both the design matrices AND an updated FormulaSpec with learned state.
The returned FormulaSpec has contrast_matrices, contrast_types, and transform_state populated (they may be empty in the input spec from parse_formula if this is the first build).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | FormulaSpec | FormulaSpec from parse_formula(). | required |
data | DataFrame | Polars DataFrame with training data. | required |
Returns:
| Type | Description |
|---|---|
tuple[DesignResult, FormulaSpec] | (DesignResult, FormulaSpec) — matrices + updated spec with learned encoding. |
encoding¶
Categorical variable encoding using Polars Enum.
Functions:
| Name | Description |
|---|---|
detect_categoricals | Detect categorical variables from a formula AST. |
detect_levels | Infer level ordering from a non-categorical series. |
encode_categorical | Encode a categorical series using a contrast matrix. |
ensure_enum | Convert columns to Enum type with specified level ordering. |
get_levels | Get the level ordering from an Enum or Categorical series. |
Classes¶
Functions¶
detect_categoricals¶
detect_categoricals(ast: Binary | Call | Variable | object, data: pl.DataFrame) -> dict[str, list[str]]Detect categorical variables from a formula AST.
Walks the AST to find:
Explicit categorical markers: factor(x), T(x), S(x)
String columns referenced in the formula
For explicit markers, extracts level information if provided. For implicit string columns, infers levels from data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ast | Binary | Call | Variable | object | Parsed formula AST (from parser). | required |
data | DataFrame | DataFrame to check column types against. | required |
Returns:
| Type | Description |
|---|---|
dict[str, list[str]] | Dict mapping column names to ordered level lists. |
Examples:
>>> from parser import Scanner, Parser
>>> import polars as pl
>>> tokens = Scanner('y ~ factor(group) + age').scan()
>>> ast = Parser(tokens).parse()
>>> df = pl.DataFrame({'y': [1, 2], 'group': ['A', 'B'], 'age': [30, 40]})
>>> detect_categoricals(ast, df)
{'group': ['A', 'B']}detect_levels¶
detect_levels(series: pl.Series) -> list[str]Infer level ordering from a non-categorical series.
For string columns that haven’t been converted to Enum yet, infers levels by getting unique values and sorting them.
For numeric columns used with factor(), formats integer values without decimal points (e.g., 6.0 becomes “6”).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series | Series | Polars series (any type). | required |
Returns:
| Type | Description |
|---|---|
list[str] | List of unique values, sorted alphabetically/numerically. |
Examples:
>>> import polars as pl
>>> s = pl.Series('x', ['B', 'A', 'C', 'A'])
>>> detect_levels(s)
['A', 'B', 'C']
>>> s = pl.Series('x', [6.0, 4.0, 8.0, 4.0])
>>> detect_levels(s)
['4', '6', '8']encode_categorical¶
encode_categorical(series: pl.Series, contrast: NDArray[np.float64]) -> NDArray[np.float64]Encode a categorical series using a contrast matrix.
Takes a Polars series (must be Enum or Categorical type) and applies contrast encoding by indexing into the contrast matrix with the integer codes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series | Series | Polars series with Enum or Categorical dtype. | required |
contrast | NDArray[float64] | Contrast matrix of shape (n_levels, n_columns). Row order must match the series’ category order. | required |
Returns:
| Type | Description |
|---|---|
NDArray[float64] | Encoded array of shape (n_obs, n_columns). |
Examples:
>>> import polars as pl
>>> from coding import treatment_coding
>>> series = pl.Series('x', ['B', 'A', 'C']).cast(pl.Enum(['A', 'B', 'C']))
>>> contrast = treatment_coding(['A', 'B', 'C'])
>>> encode_categorical(series, contrast)
array([[1., 0.],
[0., 0.],
[0., 1.]])ensure_enum¶
ensure_enum(data: pl.DataFrame, factors: dict[str, list[str]]) -> pl.DataFrameConvert columns to Enum type with specified level ordering.
This function converts string/categorical columns to Polars Enum type, ensuring consistent level ordering. If a column is already an Enum with matching levels, it is left unchanged.
For numeric columns, applies the same formatting as detect_levels() to ensure consistency (e.g., 6.0 -> “6”).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | DataFrame | DataFrame to modify. | required |
factors | dict[str, list[str]] | Dict mapping column names to ordered level lists. Level order determines reference category (first = reference). | required |
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame with specified columns converted to Enum type. |
Examples:
>>> import polars as pl
>>> df = pl.DataFrame({'group': ['B', 'A', 'C', 'A', 'B']})
>>> df = ensure_enum(df, {'group': ['A', 'B', 'C']})
>>> df['group'].dtype
Enum(categories=['A', 'B', 'C'])get_levels¶
get_levels(series: pl.Series) -> list[str]Get the level ordering from an Enum or Categorical series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series | Series | Polars series with Enum or Categorical dtype. | required |
Returns:
| Type | Description |
|---|---|
list[str] | List of category levels in their defined order. |
Examples:
>>> import polars as pl
>>> s = pl.Series('x', ['B', 'A']).cast(pl.Enum(['A', 'B', 'C']))
>>> get_levels(s)
['A', 'B', 'C']evaluate¶
Term evaluation — AST nodes to design matrix columns.
Classes:
| Name | Description |
|---|---|
TermResult | Result of evaluating one formula term. |
Functions:
| Name | Description |
|---|---|
evaluate_call | Evaluate a function call term. |
evaluate_categorical | Evaluate a categorical variable with contrast encoding. |
evaluate_interaction | Evaluate an interaction term (a:b). |
evaluate_star | Evaluate a * term (main effects + interaction): a * b = a + b + a:b. |
evaluate_term | Evaluate a single formula term against data. |
evaluate_variable | Evaluate a simple variable reference. |
Attributes¶
Classes¶
TermResult¶
Result of evaluating one formula term.
Attributes:
| Name | Type | Description |
|---|---|---|
columns | NDArray[float64] | Data array, shape (n_obs,) or (n_obs, k). |
labels | list[str] | Column names for the result. |
state_updates | dict | Partial state to merge back into accumulators. May contain keys: “factors”, “contrast_matrices”, “contrast_types”, “transform_state”, “transforms”. |
Attributes¶
columns¶
columns: NDArray[np.float64]labels¶
labels: list[str]state_updates¶
state_updates: dictFunctions¶
evaluate_call¶
evaluate_call(call: Call, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> tuple[TermResult, bool]Evaluate a function call term.
Handles: contrast functions, transform, center/norm/zscore/scale, log/log10/sqrt.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
call | Call | Call AST node. | required |
data | DataFrame | Polars DataFrame. | required |
factors | dict[str, list[str]] | Current factor levels mapping. | required |
contrast_matrices | dict[str, NDArray] | Current contrast matrices. | required |
transform_state | dict[str, dict] | Current transform state mapping. | required |
transforms | dict[str, object] | Current fitted transform instances. | required |
custom_contrasts | dict[str, NDArray] | User-provided contrast matrices. | required |
intercept_absorbed | bool | Whether intercept df has been absorbed. | required |
Returns:
| Type | Description |
|---|---|
tuple[TermResult, bool] | (TermResult, updated_intercept_absorbed). |
evaluate_categorical¶
evaluate_categorical(name: str, series: pl.Series, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool, *, spans_intercept: bool | None = None, contrast_type: str | None = None) -> TermResultEvaluate a categorical variable with contrast encoding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name | str | Variable name. | required |
series | Series | Polars series with categorical data. | required |
factors | dict[str, list[str]] | Current factor levels mapping. | required |
contrast_matrices | dict[str, NDArray] | Current contrast matrices. | required |
custom_contrasts | dict[str, NDArray] | User-provided contrast matrices. | required |
intercept_absorbed | bool | Whether intercept df has been absorbed. | required |
spans_intercept | bool | None | Whether this categorical should span the intercept. If None, determined automatically. | None |
contrast_type | str | None | Type of contrast to use. If None, defaults to “treatment”. | None |
Returns:
| Type | Description |
|---|---|
TermResult | TermResult with encoded data and state updates. |
evaluate_interaction¶
evaluate_interaction(term: Binary, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool, *, is_toplevel: bool = True) -> tuple[TermResult, bool]Evaluate an interaction term (a:b).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
term | Binary | Binary AST node with COLON operator. | required |
data | DataFrame | Polars DataFrame. | required |
factors | dict[str, list[str]] | Current factor levels mapping. | required |
contrast_matrices | dict[str, NDArray] | Current contrast matrices. | required |
transform_state | dict[str, dict] | Current transform state. | required |
transforms | dict[str, object] | Current fitted transforms. | required |
custom_contrasts | dict[str, NDArray] | User-provided contrast matrices. | required |
intercept_absorbed | bool | Whether intercept df has been absorbed. | required |
is_toplevel | bool | Whether this is a top-level interaction term. | True |
Returns:
| Type | Description |
|---|---|
tuple[TermResult, bool] | (TermResult, updated_intercept_absorbed). |
evaluate_star¶
evaluate_star(term: Binary, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> tuple[TermResult, bool]Evaluate a * term (main effects + interaction): a * b = a + b + a:b.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
term | Binary | Binary AST node with STAR operator. | required |
data | DataFrame | Polars DataFrame. | required |
factors | dict[str, list[str]] | Current factor levels mapping. | required |
contrast_matrices | dict[str, NDArray] | Current contrast matrices. | required |
transform_state | dict[str, dict] | Current transform state. | required |
transforms | dict[str, object] | Current fitted transforms. | required |
custom_contrasts | dict[str, NDArray] | User-provided contrast matrices. | required |
intercept_absorbed | bool | Whether intercept df has been absorbed. | required |
Returns:
| Type | Description |
|---|---|
tuple[TermResult, bool] | (TermResult, updated_intercept_absorbed). |
evaluate_term¶
evaluate_term(term: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> tuple[TermResult, bool]Evaluate a single formula term against data.
Dispatches to evaluate_variable, evaluate_call, evaluate_interaction, etc. based on AST node type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
term | object | AST node representing the term. | required |
data | DataFrame | Polars DataFrame with training data. | required |
factors | dict[str, list[str]] | Current factor levels mapping. | required |
contrast_matrices | dict[str, NDArray] | Current contrast matrices mapping. | required |
transform_state | dict[str, dict] | Current transform state mapping. | required |
transforms | dict[str, object] | Current fitted transform instances mapping. | required |
custom_contrasts | dict[str, NDArray] | User-provided contrast matrices. | required |
intercept_absorbed | bool | Whether intercept df has been absorbed. | required |
Returns:
| Type | Description |
|---|---|
tuple[TermResult, bool] | (TermResult, updated_intercept_absorbed). |
evaluate_variable¶
evaluate_variable(name: str, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool) -> TermResultEvaluate a simple variable reference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name | str | Column name. | required |
data | DataFrame | Polars DataFrame. | required |
factors | dict[str, list[str]] | Current factor levels mapping. | required |
contrast_matrices | dict[str, NDArray] | Current contrast matrices. | required |
custom_contrasts | dict[str, NDArray] | User-provided contrast matrices. | required |
intercept_absorbed | bool | Whether intercept df has been absorbed. | required |
Returns:
| Type | Description |
|---|---|
TermResult | TermResult with evaluated data. |
evaluate_contrast¶
Contrast function evaluation for formula-syntax categorical encoding.
Handles formula expressions like treatment(x, ref=B), sum(x, omit=A),
helmert(x, [low, med, high]), poly(x, [lo, hi], degree=2).
Each contrast function maps to a builder from design.coding and
a label generator. The function name in the formula IS the encoding scheme.
Functions:
| Name | Description |
|---|---|
evaluate_contrast_call | Evaluate a contrast encoding function call. |
Attributes¶
Classes¶
Functions¶
evaluate_contrast_call¶
evaluate_contrast_call(call: Call, func_name: str, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], custom_contrasts: dict[str, NDArray], intercept_absorbed: bool, *, spans_intercept: bool | None = None) -> tuple['TermResult', bool]Evaluate a contrast encoding function call.
Dispatches to the appropriate contrast matrix builder based on the function name (treatment, sum, helmert, sequential, poly).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
call | Call | Call AST node (e.g., treatment(x, ref=B)). | required |
func_name | str | Canonical function name (already resolved from aliases). | required |
data | DataFrame | Polars DataFrame with training data. | required |
factors | dict[str, list[str]] | Current factor levels mapping. | required |
contrast_matrices | dict[str, NDArray] | Current contrast matrices mapping. | required |
custom_contrasts | dict[str, NDArray] | User-provided contrast matrices (ndarray overrides). | required |
intercept_absorbed | bool | Whether intercept df has been absorbed. | required |
spans_intercept | bool | None | Whether this categorical should span the intercept. If None, determined automatically from intercept_absorbed. | None |
Returns:
| Type | Description |
|---|---|
tuple’TermResult’, [bool] | (TermResult, updated_intercept_absorbed). |
evaluate_newdata¶
Newdata evaluation — apply learned encoding to new observations.
Functions:
| Name | Description |
|---|---|
evaluate_newdata | Apply learned encoding from FormulaSpec to new data. |
Attributes¶
Classes¶
Functions¶
evaluate_newdata¶
evaluate_newdata(spec: FormulaSpec, data: pl.DataFrame, *, on_unseen_level: str = 'error') -> NDArray[np.float64]Apply learned encoding from FormulaSpec to new data.
Pure function: reads factor levels, contrast matrices, and transform state from spec. No mutation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | FormulaSpec | FormulaSpec with learned encoding from build_design_matrices(). | required |
data | DataFrame | New data as Polars DataFrame. | required |
on_unseen_level | str | How to handle unseen categorical levels. - “error”: Raise ValueError (default) - “warn”: Warn and encode as zeros - “ignore”: Silently encode as zeros | ‘error’ |
Returns:
| Type | Description |
|---|---|
NDArray[float64] | X matrix for new observations, shape (n_new, n_features). |
NDArray[float64] | Column order matches the original build_design_matrices() output. |
evaluate_transforms¶
Transform evaluation — stateful, math, and polynomial transforms.
Extracted from evaluate.py to keep file sizes manageable and provide a clean home for nested transform evaluation logic.
Functions:
| Name | Description |
|---|---|
evaluate_math_transform | Evaluate a math transform like log(), sqrt(). |
evaluate_stateful_transform | Evaluate a stateful transform like center(), scale(), rank(). |
resolve_transform_arg | Resolve a transform argument to raw data, handling nested calls. |
Classes¶
Functions¶
evaluate_math_transform¶
evaluate_math_transform(func_name: str, arg: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray]) -> TermResultEvaluate a math transform like log(), sqrt().
Supports nested calls: log(rank(x)) evaluates rank(x) first, then applies log to the result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func_name | str | Transform name. | required |
arg | object | AST node for the argument (Variable or nested Call). | required |
data | DataFrame | Polars DataFrame. | required |
factors | dict[str, list[str]] | Current factor levels mapping. | required |
contrast_matrices | dict[str, NDArray] | Current contrast matrices. | required |
transform_state | dict[str, dict] | Current transform state mapping. | required |
transforms | dict[str, object] | Current fitted transform instances. | required |
custom_contrasts | dict[str, NDArray] | User-provided contrast matrices. | required |
Returns:
| Type | Description |
|---|---|
TermResult | TermResult with transformed data. |
evaluate_stateful_transform¶
evaluate_stateful_transform(func_name: str, arg: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray]) -> TermResultEvaluate a stateful transform like center(), scale(), rank().
Supports nested calls: zscore(rank(x)) evaluates rank(x) first, then applies zscore to the result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func_name | str | Transform name. | required |
arg | object | AST node for the argument (Variable or nested Call). | required |
data | DataFrame | Polars DataFrame. | required |
factors | dict[str, list[str]] | Current factor levels mapping. | required |
contrast_matrices | dict[str, NDArray] | Current contrast matrices. | required |
transform_state | dict[str, dict] | Current transform state mapping. | required |
transforms | dict[str, object] | Current fitted transform instances. | required |
custom_contrasts | dict[str, NDArray] | User-provided contrast matrices. | required |
Returns:
| Type | Description |
|---|---|
TermResult | TermResult with transformed data and state updates. |
resolve_transform_arg¶
resolve_transform_arg(arg: object, data: pl.DataFrame, factors: dict[str, list[str]], contrast_matrices: dict[str, NDArray], transform_state: dict[str, dict], transforms: dict[str, object], custom_contrasts: dict[str, NDArray]) -> tuple[NDArray[np.float64], str, dict]Resolve a transform argument to raw data, handling nested calls.
If the argument is a simple variable, fetches it from the DataFrame. If the argument is a nested Call (e.g. rank(x) inside zscore(rank(x))), recursively evaluates the inner call first.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg | object | AST node — Variable, QuotedName, or Call. | required |
data | DataFrame | Polars DataFrame. | required |
factors | dict[str, list[str]] | Current factor levels mapping. | required |
contrast_matrices | dict[str, NDArray] | Current contrast matrices. | required |
transform_state | dict[str, dict] | Current transform state mapping. | required |
transforms | dict[str, object] | Current fitted transform instances. | required |
custom_contrasts | dict[str, NDArray] | User-provided contrast matrices. | required |
Returns:
| Type | Description |
|---|---|
NDArray[float64] | (raw_data, label, inner_state_updates) where raw_data is a 1-D float64 |
str | array, label is the display name (e.g. “x” or “rank(x)”), and |
dict | inner_state_updates is any state produced by inner transforms. |
helpers¶
Shared AST utilities for formula operations.
Functions:
| Name | Description |
|---|---|
contains_pipe | Check if an AST node contains a PIPE operator (random effect). |
extract_name | Extract variable name or literal value from AST node. |
variable_not_found_error | Create informative error for missing variable. |
Classes¶
Functions¶
contains_pipe¶
contains_pipe(node: object) -> boolCheck if an AST node contains a PIPE operator (random effect).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node | object | AST node to check. | required |
Returns:
| Type | Description |
|---|---|
bool | True if node contains a PIPE operator. |
extract_name¶
extract_name(node: object) -> str | NoneExtract variable name or literal value from AST node.
Handles interaction terms (a:b) by recursively extracting names from both sides and joining with ‘:’.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node | object | AST node to extract name from. | required |
Returns:
| Type | Description |
|---|---|
str | None | Variable name string, or None if not extractable. |
variable_not_found_error¶
variable_not_found_error(name: str, data_columns: list[str]) -> ValueErrorCreate informative error for missing variable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name | str | Variable name that was not found. | required |
data_columns | list[str] | List of available column names in the data. | required |
Returns:
| Type | Description |
|---|---|
ValueError | ValueError with helpful message including available columns |
ValueError | and “did you mean?” suggestions. |
parse¶
R-style formula string parsing into FormulaSpec containers.
Classes:
| Name | Description |
|---|---|
FormulaError | Exception raised for formula parsing errors. |
FormulaStructure | Data-free formula structure extracted from an AST. |
Functions:
| Name | Description |
|---|---|
expand_double_verts | Expand |
expand_nested_syntax | Expand nested / syntax into separate crossed random effects terms. |
extract_formula_structure | Extract formula structure from a formula string without data. |
parse_formula | Parse formula and detect categoricals from data. |
Attributes¶
Classes¶
FormulaError¶
FormulaError(message: str, formula: str | None = None, position: int | None = None) -> NoneBases: ValueError
Exception raised for formula parsing errors.
Provides helpful error messages with pointer to error position.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
message | str | Error description. | required |
formula | str | None | The formula that caused the error. | None |
position | int | None | Character position of the error (optional). | None |
Attributes:
| Name | Type | Description |
|---|---|---|
formula | ||
position |
Attributes¶
formula¶
formula = formulaposition¶
position = positionFormulaStructure¶
Data-free formula structure extracted from an AST.
Contains the same structural information as a full parse_formula()
call but without requiring data for categorical detection. Used by
build_model_spec_from_formula() to replace regex-based extraction.
Attributes:
| Name | Type | Description |
|---|---|---|
response_var | str | None | Response variable name, or None for RHS-only formulas. |
response_transform | tuple[str, ...] | None | Tuple of LHS transforms (innermost-first), or None if no transforms. |
fixed_term_names | tuple[str, ...] | Human-readable fixed-effect term names (e.g. ["Intercept", "x", "group"]). |
has_intercept | bool | Whether the formula includes an intercept. |
has_random_effects | bool | Whether the formula contains `` |
random_terms_raw | tuple[str, ...] | Raw string representations of RE terms (e.g. ``["(1 |
Attributes¶
fixed_term_names¶
fixed_term_names: tuple[str, ...]has_intercept¶
has_intercept: boolhas_random_effects¶
has_random_effects: boolrandom_terms_raw¶
random_terms_raw: tuple[str, ...]response_transform¶
response_transform: tuple[str, ...] | Noneresponse_var¶
response_var: str | NoneFunctions¶
expand_double_verts¶
expand_double_verts(formula: str) -> tuple[str, dict]Expand || syntax into separate uncorrelated random effects terms.
This matches lme4’s expandDoubleVerts() function. The || syntax creates independent (uncorrelated) random effects by expanding to separate terms.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | R-style formula string potentially containing |
Returns:
| Type | Description |
|---|---|
tuple[str, dict] | A tuple of: - Expanded formula string with |
Examples:
>>> expand_double_verts("y ~ x + (Days || Subject)")
('y ~ x + (1 | Subject) + (0 + Days | Subject)', {...})>>> expand_double_verts("y ~ x + (1 + x + y || group)")
('y ~ x + (1 | group) + (0 + x | group) + (0 + y | group)', {...})Note: The transformation rules are:
(x || g) -> (1 | g) + (0 + x | g)
(1 + x || g) -> (1 | g) + (0 + x | g)
(1 + x + y || g) -> (1 | g) + (0 + x | g) + (0 + y | g)
(0 + x || g) -> (0 + x | g) [no intercept term added]
expand_nested_syntax¶
expand_nested_syntax(formula: str) -> tuple[str, dict]Expand nested / syntax into separate crossed random effects terms.
This matches lme4’s behavior where nested syntax (a/b) is syntactic sugar for separate terms: (1|a/b) expands to (1|a) + (1|a:b).
The : in the grouping factor creates an interaction grouping where each unique combination of levels becomes a separate group.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | R-style formula string potentially containing / in RE terms. | required |
Returns:
| Type | Description |
|---|---|
tuple[str, dict] | A tuple of: - Expanded formula string with / replaced by separate terms - Metadata dict tracking which terms came from / expansion |
Examples:
>>> expand_nested_syntax("y ~ x + (1|school/class)")
('y ~ x + (1|school) + (1|school:class)', {...})>>> expand_nested_syntax("y ~ x + (1|a/b/c)")
('y ~ x + (1|a) + (1|a:b) + (1|a:b:c)', {...})>>> expand_nested_syntax("y ~ x + (Days|Subject/Session)")
('y ~ x + (Days|Subject) + (Days|Subject:Session)', {...})Note: The transformation rules are:
(1|a/b) -> (1|a) + (1|a:b)
(1|a/b/c) -> (1|a) + (1|a:b) + (1|a:b:c)
(x|a/b) -> (x|a) + (x|a:b)
(1 + x|a/b) -> (1 + x|a) + (1 + x|a:b)
extract_formula_structure¶
extract_formula_structure(formula: str) -> FormulaStructureExtract formula structure from a formula string without data.
Parses the formula into an AST and walks it to extract response variable, fixed-effect term names, intercept presence, and random-effect term strings. Does not require data (no categorical detection).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | R-style formula string (e.g. ``"y ~ x + (1 | group)"``). |
Returns:
| Type | Description |
|---|---|
FormulaStructure | FormulaStructure with extracted information. |
parse_formula¶
parse_formula(formula: str, data: pl.DataFrame, *, factors: dict[str, list[str]] | None = None, custom_contrasts: dict[str, NDArray] | None = None) -> FormulaSpecParse formula and detect categoricals from data.
This does parsing + categorical detection but NOT matrix construction. Reuses parser/ for tokenization and AST construction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | R-style formula string (e.g., “y ~ x + z”). | required |
data | DataFrame | Polars DataFrame to detect categoricals from. | required |
factors | dict[str, list[str]] | None | Optional dict mapping column names to level orderings. If provided, these orderings are used for categorical encoding. | None |
custom_contrasts | dict[str, NDArray] | None | Optional dict of user-provided contrast matrices. | None |
Returns:
| Type | Description |
|---|---|
FormulaSpec | FormulaSpec with parsed terms and detected factor levels. |
parser¶
Recursive descent parser for statistical formula strings.
Modules:
| Name | Description |
|---|---|
expr | AST expression node types for formula parsing. |
parser | Recursive descent parser for formula strings. |
scanner | Formula string scanner/tokenizer. |
token | Token class for formula parsing. |
Classes:
| Name | Description |
|---|---|
Assign | Expression for assignments (e.g., x=value in function calls). |
Binary | Expression for binary operations (e.g., x + y, x ~ y). |
Call | Expression for function calls (e.g., factor(x), center(y)). |
Grouping | Expression for parenthesized groups. |
ListExpr | Expression for bracket list literals (e.g., [low, med, high]). |
Literal | Expression for literal values (numbers, strings, etc.). |
ParseError | Error raised during formula parsing. |
Parser | Parse a sequence of Tokens and return an abstract syntax tree. |
QuotedName | Expression for back-quoted names (e.g., weird column name!). |
ScanError | Error raised during formula scanning. |
Scanner | Scan formula string and return Tokens. |
Token | Representation of a single Token. |
Unary | Expression for unary operations (e.g., -x, +x). |
Variable | Expression for variable references. |
Classes¶
Assign¶
Assign(name: 'Variable', value: object) -> NoneExpression for assignments (e.g., x=value in function calls).
Attributes:
| Name | Type | Description |
|---|---|---|
name | ||
value |
Attributes¶
name¶
name = namevalue¶
value = valueBinary¶
Binary(left: object, operator: Token, right: object) -> NoneExpression for binary operations (e.g., x + y, x ~ y).
Attributes:
| Name | Type | Description |
|---|---|---|
left | ||
operator | ||
right |
Attributes¶
left¶
left = leftoperator¶
operator = operatorright¶
right = rightCall¶
Call(callee: object, args: list) -> NoneExpression for function calls (e.g., factor(x), center(y)).
Attributes:
| Name | Type | Description |
|---|---|---|
args | ||
callee |
Attributes¶
args¶
args = argscallee¶
callee = calleeGrouping¶
Grouping(expression: object) -> NoneExpression for parenthesized groups.
Attributes:
| Name | Type | Description |
|---|---|---|
expression |
Attributes¶
expression¶
expression = expressionListExpr¶
ListExpr(elements: list[object]) -> NoneExpression for bracket list literals (e.g., [low, med, high]).
Used for level ordering in contrast functions like helmert(x, [low, med, high]).
Attributes:
| Name | Type | Description |
|---|---|---|
elements |
Attributes¶
elements¶
elements = elementsLiteral¶
Literal(value: object, lexeme: str | None = None) -> NoneExpression for literal values (numbers, strings, etc.).
Attributes:
| Name | Type | Description |
|---|---|---|
lexeme | ||
value |
Attributes¶
lexeme¶
lexeme = lexemevalue¶
value = valueParseError¶
Bases: Exception
Error raised during formula parsing.
Parser¶
Parser(tokens: list[Token], formula: str = '') -> NoneParse a sequence of Tokens and return an abstract syntax tree.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens | list[Token] | A list of Token objects as returned by Scanner.scan(). | required |
formula | str | The original formula string (for error messages). | ‘’ |
Functions:
| Name | Description |
|---|---|
addition | |
advance | |
assignment | |
at_end | |
call | |
check | Check if current token matches any of the given types. |
comparison | |
consume | Consume the next Token, raising ParseError if it doesn’t match. |
expression | |
finish_call | |
format_error_context | Format a parse error with visual pointer to the error location. |
interaction | |
match | Match and consume token if it matches any of the given types. |
multiple_interaction | |
multiplication | |
parse | Parse a sequence of Tokens. |
peek | Return the Token we are about to consume. |
previous | Return the last Token we consumed. |
primary | |
random_effect | |
tilde | |
unary |
Attributes:
| Name | Type | Description |
|---|---|---|
current | ||
formula | ||
tokens |
Attributes¶
current¶
current = 0formula¶
formula = formulatokens¶
tokens = tokensFunctions¶
addition¶
addition() -> objectadvance¶
advance() -> Token | Noneassignment¶
assignment() -> objectat_end¶
at_end() -> boolcall¶
call() -> objectcheck¶
check(types: str | list[str]) -> boolCheck if current token matches any of the given types.
comparison¶
comparison() -> objectconsume¶
consume(kind: str, message: str) -> TokenConsume the next Token, raising ParseError if it doesn’t match.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind | str | Expected token kind. | required |
message | str | Error message if token doesn’t match. | required |
Returns:
| Type | Description |
|---|---|
Token | The consumed token. |
expression¶
expression() -> objectfinish_call¶
finish_call(expr: object) -> Callformat_error_context¶
format_error_context(position: int, message: str) -> strFormat a parse error with visual pointer to the error location.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
position | int | Character offset where error occurred. | required |
message | str | The error description. | required |
Returns:
| Type | Description |
|---|---|
str | Formatted error message with context and pointer. |
interaction¶
interaction() -> objectmatch¶
match(types: str | list[str]) -> boolMatch and consume token if it matches any of the given types.
multiple_interaction¶
multiple_interaction() -> objectmultiplication¶
multiplication() -> objectparse¶
parse() -> objectParse a sequence of Tokens.
Returns:
| Type | Description |
|---|---|
object | An AST expression node representing the parsed formula. |
peek¶
peek() -> TokenReturn the Token we are about to consume.
previous¶
previous() -> TokenReturn the last Token we consumed.
primary¶
primary() -> objectrandom_effect¶
random_effect() -> objecttilde¶
tilde() -> objectunary¶
unary() -> objectQuotedName¶
QuotedName(expression: Token) -> NoneExpression for back-quoted names (e.g., weird column name!).
Attributes:
| Name | Type | Description |
|---|---|---|
expression |
Attributes¶
expression¶
expression = expressionScanError¶
Bases: Exception
Error raised during formula scanning.
Scanner¶
Scanner(code: str) -> NoneScan formula string and return Tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code | str | The formula string to scan. | required |
Functions:
| Name | Description |
|---|---|
add_token | |
advance | |
at_end | |
backquote | |
char | |
floatnum | |
identifier | |
match | |
number | |
peek | |
peek_next | |
scan | Scan formula string. |
scan_token |
Attributes:
| Name | Type | Description |
|---|---|---|
code | ||
current | ||
start | ||
tokens | list[Token] |
Attributes¶
code¶
code = codecurrent¶
current = 0start¶
start = 0tokens¶
tokens: list[Token] = []Functions¶
add_token¶
add_token(kind: str, literal: object = None) -> Noneadvance¶
advance() -> strat_end¶
at_end() -> boolbackquote¶
backquote() -> Nonechar¶
char() -> Nonefloatnum¶
floatnum() -> Noneidentifier¶
identifier() -> Nonematch¶
match(expected: str) -> boolnumber¶
number() -> Nonepeek¶
peek() -> strpeek_next¶
peek_next() -> strscan¶
scan(add_intercept: bool = True) -> list[Token]Scan formula string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
add_intercept | bool | Whether to add an implicit intercept. Defaults to True. | True |
Returns:
| Type | Description |
|---|---|
list[Token] | A list of Token objects. |
scan_token¶
scan_token() -> NoneToken¶
Token(kind: str, lexeme: str, literal: object = None, position: int = 0) -> NoneRepresentation of a single Token.
Attributes:
| Name | Type | Description |
|---|---|---|
kind | Token type (e.g., “IDENTIFIER”, “PLUS”, “TILDE”). | |
lexeme | The actual string from the source. | |
literal | Parsed literal value (for numbers, strings). | |
position | Character offset in the original formula string. |
Attributes¶
kind¶
kind = kindlexeme¶
lexeme = lexemeliteral¶
literal = literalposition¶
position = positionUnary¶
Unary(operator: Token, right: object) -> NoneExpression for unary operations (e.g., -x, +x).
Attributes:
| Name | Type | Description |
|---|---|---|
operator | ||
right |
Attributes¶
operator¶
operator = operatorright¶
right = rightVariable¶
Variable(name: Token, level: 'Literal | None' = None) -> NoneExpression for variable references.
Attributes:
| Name | Type | Description |
|---|---|---|
level | ||
name |
Attributes¶
level¶
level = levelname¶
name = nameModules¶
expr¶
AST expression node types for formula parsing.
Vendored from formulae library (https://
Classes:
| Name | Description |
|---|---|
Assign | Expression for assignments (e.g., x=value in function calls). |
Binary | Expression for binary operations (e.g., x + y, x ~ y). |
Call | Expression for function calls (e.g., factor(x), center(y)). |
Grouping | Expression for parenthesized groups. |
ListExpr | Expression for bracket list literals (e.g., [low, med, high]). |
Literal | Expression for literal values (numbers, strings, etc.). |
QuotedName | Expression for back-quoted names (e.g., weird column name!). |
Unary | Expression for unary operations (e.g., -x, +x). |
Variable | Expression for variable references. |
Classes¶
Assign¶
Assign(name: 'Variable', value: object) -> NoneExpression for assignments (e.g., x=value in function calls).
Attributes:
| Name | Type | Description |
|---|---|---|
name | ||
value |
Attributes¶
name¶
name = namevalue¶
value = valueBinary¶
Binary(left: object, operator: Token, right: object) -> NoneExpression for binary operations (e.g., x + y, x ~ y).
Attributes:
| Name | Type | Description |
|---|---|---|
left | ||
operator | ||
right |
Attributes¶
left¶
left = leftoperator¶
operator = operatorright¶
right = rightCall¶
Call(callee: object, args: list) -> NoneExpression for function calls (e.g., factor(x), center(y)).
Attributes:
| Name | Type | Description |
|---|---|---|
args | ||
callee |
Attributes¶
args¶
args = argscallee¶
callee = calleeGrouping¶
Grouping(expression: object) -> NoneExpression for parenthesized groups.
Attributes:
| Name | Type | Description |
|---|---|---|
expression |
Attributes¶
expression¶
expression = expressionListExpr¶
ListExpr(elements: list[object]) -> NoneExpression for bracket list literals (e.g., [low, med, high]).
Used for level ordering in contrast functions like helmert(x, [low, med, high]).
Attributes:
| Name | Type | Description |
|---|---|---|
elements |
Attributes¶
elements¶
elements = elementsLiteral¶
Literal(value: object, lexeme: str | None = None) -> NoneExpression for literal values (numbers, strings, etc.).
Attributes:
| Name | Type | Description |
|---|---|---|
lexeme | ||
value |
Attributes¶
lexeme¶
lexeme = lexemevalue¶
value = valueQuotedName¶
QuotedName(expression: Token) -> NoneExpression for back-quoted names (e.g., weird column name!).
Attributes:
| Name | Type | Description |
|---|---|---|
expression |
Attributes¶
expression¶
expression = expressionUnary¶
Unary(operator: Token, right: object) -> NoneExpression for unary operations (e.g., -x, +x).
Attributes:
| Name | Type | Description |
|---|---|---|
operator | ||
right |
Attributes¶
operator¶
operator = operatorright¶
right = rightVariable¶
Variable(name: Token, level: 'Literal | None' = None) -> NoneExpression for variable references.
Attributes:
| Name | Type | Description |
|---|---|---|
level | ||
name |
Attributes¶
level¶
level = levelname¶
name = nameparser¶
Recursive descent parser for formula strings.
Vendored from formulae library (https://
Classes:
| Name | Description |
|---|---|
ParseError | Error raised during formula parsing. |
Parser | Parse a sequence of Tokens and return an abstract syntax tree. |
Functions:
| Name | Description |
|---|---|
listify | Wrap non-list objects in a list. |
Classes¶
ParseError¶
Bases: Exception
Error raised during formula parsing.
Parser¶
Parser(tokens: list[Token], formula: str = '') -> NoneParse a sequence of Tokens and return an abstract syntax tree.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens | list[Token] | A list of Token objects as returned by Scanner.scan(). | required |
formula | str | The original formula string (for error messages). | ‘’ |
Functions:
| Name | Description |
|---|---|
addition | |
advance | |
assignment | |
at_end | |
call | |
check | Check if current token matches any of the given types. |
comparison | |
consume | Consume the next Token, raising ParseError if it doesn’t match. |
expression | |
finish_call | |
format_error_context | Format a parse error with visual pointer to the error location. |
interaction | |
match | Match and consume token if it matches any of the given types. |
multiple_interaction | |
multiplication | |
parse | Parse a sequence of Tokens. |
peek | Return the Token we are about to consume. |
previous | Return the last Token we consumed. |
primary | |
random_effect | |
tilde | |
unary |
Attributes:
| Name | Type | Description |
|---|---|---|
current | ||
formula | ||
tokens |
Attributes¶
current¶
current = 0formula¶
formula = formulatokens¶
tokens = tokensFunctions¶
addition¶
addition() -> objectadvance¶
advance() -> Token | Noneassignment¶
assignment() -> objectat_end¶
at_end() -> boolcall¶
call() -> objectcheck¶
check(types: str | list[str]) -> boolCheck if current token matches any of the given types.
comparison¶
comparison() -> objectconsume¶
consume(kind: str, message: str) -> TokenConsume the next Token, raising ParseError if it doesn’t match.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind | str | Expected token kind. | required |
message | str | Error message if token doesn’t match. | required |
Returns:
| Type | Description |
|---|---|
Token | The consumed token. |
expression¶
expression() -> objectfinish_call¶
finish_call(expr: object) -> Callformat_error_context¶
format_error_context(position: int, message: str) -> strFormat a parse error with visual pointer to the error location.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
position | int | Character offset where error occurred. | required |
message | str | The error description. | required |
Returns:
| Type | Description |
|---|---|
str | Formatted error message with context and pointer. |
interaction¶
interaction() -> objectmatch¶
match(types: str | list[str]) -> boolMatch and consume token if it matches any of the given types.
multiple_interaction¶
multiple_interaction() -> objectmultiplication¶
multiplication() -> objectparse¶
parse() -> objectParse a sequence of Tokens.
Returns:
| Type | Description |
|---|---|
object | An AST expression node representing the parsed formula. |
peek¶
peek() -> TokenReturn the Token we are about to consume.
previous¶
previous() -> TokenReturn the last Token we consumed.
primary¶
primary() -> objectrandom_effect¶
random_effect() -> objecttilde¶
tilde() -> objectunary¶
unary() -> objectFunctions¶
listify¶
listify(obj: str | list[str] | None) -> list[str]Wrap non-list objects in a list.
scanner¶
Formula string scanner/tokenizer.
Vendored from formulae library (https://
Classes:
| Name | Description |
|---|---|
ScanError | Error raised during formula scanning. |
Scanner | Scan formula string and return Tokens. |
Functions:
| Name | Description |
|---|---|
format_error_context | Format a scan error with visual pointer to the error location. |
Classes¶
ScanError¶
Bases: Exception
Error raised during formula scanning.
Scanner¶
Scanner(code: str) -> NoneScan formula string and return Tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code | str | The formula string to scan. | required |
Functions:
| Name | Description |
|---|---|
add_token | |
advance | |
at_end | |
backquote | |
char | |
floatnum | |
identifier | |
match | |
number | |
peek | |
peek_next | |
scan | Scan formula string. |
scan_token |
Attributes:
| Name | Type | Description |
|---|---|---|
code | ||
current | ||
start | ||
tokens | list[Token] |
Attributes¶
code¶
code = codecurrent¶
current = 0start¶
start = 0tokens¶
tokens: list[Token] = []Functions¶
add_token¶
add_token(kind: str, literal: object = None) -> Noneadvance¶
advance() -> strat_end¶
at_end() -> boolbackquote¶
backquote() -> Nonechar¶
char() -> Nonefloatnum¶
floatnum() -> Noneidentifier¶
identifier() -> Nonematch¶
match(expected: str) -> boolnumber¶
number() -> Nonepeek¶
peek() -> strpeek_next¶
peek_next() -> strscan¶
scan(add_intercept: bool = True) -> list[Token]Scan formula string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
add_intercept | bool | Whether to add an implicit intercept. Defaults to True. | True |
Returns:
| Type | Description |
|---|---|
list[Token] | A list of Token objects. |
scan_token¶
scan_token() -> NoneFunctions¶
format_error_context¶
format_error_context(formula: str, position: int, message: str) -> strFormat a scan error with visual pointer to the error location.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula | str | The original formula string. | required |
position | int | Character offset where error occurred. | required |
message | str | The error description. | required |
Returns:
| Type | Description |
|---|---|
str | Formatted error message with context and pointer. |
token¶
Token class for formula parsing.
Vendored from formulae library (https://
Classes:
| Name | Description |
|---|---|
Token | Representation of a single Token. |
Classes¶
Token¶
Token(kind: str, lexeme: str, literal: object = None, position: int = 0) -> NoneRepresentation of a single Token.
Attributes:
| Name | Type | Description |
|---|---|---|
kind | Token type (e.g., “IDENTIFIER”, “PLUS”, “TILDE”). | |
lexeme | The actual string from the source. | |
literal | Parsed literal value (for numbers, strings). | |
position | Character offset in the original formula string. |
Attributes¶
kind¶
kind = kindlexeme¶
lexeme = lexemeliteral¶
literal = literalposition¶
position = positionrandom_effects¶
Random effects Z matrix construction from FormulaSpec.
Functions:
| Name | Description |
|---|---|
build_random_effects_from_spec | Build random effects design matrix from FormulaSpec. |
Classes¶
Functions¶
build_random_effects_from_spec¶
build_random_effects_from_spec(spec: FormulaSpec, data: pl.DataFrame) -> RandomEffectsInfo | NoneBuild random effects design matrix from FormulaSpec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec | FormulaSpec | Parsed formula specification with re_terms. | required |
data | DataFrame | Training data (Polars DataFrame). | required |
Returns:
| Type | Description |
|---|---|
RandomEffectsInfo | None | RandomEffectsInfo with Z matrix and metadata, or None if no RE terms. |