Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Data bundle containers for model computation.

Classes:

NameDescription
DataBundleValidated model data (valid observations only).
REInfoRandom effects metadata.
RankInfoRank deficiency information for a design matrix.

Classes

DataBundle

Validated model data (valid observations only).

Contains design matrices, response vector, and metadata for model fitting. Only valid (non-NA) observations are included in X, y, and related arrays. The valid_mask tracks which rows from the original data were retained.

Created by: build_bundle_from_data() Consumed by: fit_model(), dispatch_marginal_computation(), dispatch_infer() Augmented by: Never

Attributes:

NameTypeDescription
XNDArray[floating]Fixed effects design matrix (n x p array).
yNDArray[floating]Response vector (1D array of length n).
X_namestuple[str, ...]Tuple of column names for X (length p).
y_namestrName of the response variable.
valid_maskNDArray[bool_]Boolean mask indicating valid rows from original data.
n_totalintOriginal row count before NA removal.
Zcsc_matrix | NoneRandom effects design matrix (sparse csc_matrix, or None).
weightsNDArray[floating] | NoneObservation weights (1D array or None).
offsetNDArray[floating] | NoneModel offset (1D array or None).
factor_levelsdict[str, tuple[str, ...]]Dict mapping factor names to their level tuples.
re_metadataREInfo | NoneRandom effects metadata (or None for fixed-effects models).

Examples:

>>> import numpy as np
>>> from data import DataBundle
>>> bundle = DataBundle(
...     X=np.array([[1, 0], [1, 1], [1, 2]]),
...     y=np.array([1.0, 2.0, 3.0]),
...     X_names=["Intercept", "x"],
...     y_name="y",
...     valid_mask=np.array([True, True, True]),
...     n_total=3,
... )
>>> bundle.n
3
>>> bundle.p
2

Attributes

X
X: NDArray[np.floating] = field(validator=is_ndarray)
X_names
X_names: tuple[str, ...] = field(converter=to_tuple, validator=is_tuple_of_str)
Z
Z: sp.csc_matrix | None = field(default=None, validator=is_optional_sparse_csc)
contrast_types
contrast_types: dict[str, str] = field(factory=dict)
factor_levels
factor_levels: dict[str, tuple[str, ...]] = field(factory=dict, converter=to_tuple_of_tuples)
has_random_effects
has_random_effects: bool

Whether the bundle includes random effects.

Returns:

TypeDescription
boolTrue if Z matrix is present, False otherwise.
n
n: int

Number of valid observations.

Returns:

TypeDescription
intNumber of rows in the design matrix X.
n_total
n_total: int = field(validator=is_positive_int)
offset
offset: NDArray[np.floating] | None = field(default=None, validator=is_optional_ndarray)
p
p: int

Number of fixed effect parameters.

Returns:

TypeDescription
intNumber of columns in the design matrix X.
rank
rank: int

Effective rank of the fixed effects design matrix.

Returns:

TypeDescription
intNumerical rank if rank_info is available, otherwise p (full rank).
rank_info
rank_info: RankInfo | None = field(default=None)
re_metadata
re_metadata: REInfo | None = field(default=None)
response_levels
response_levels: tuple[str, ...] | None = field(default=None, converter=to_optional_tuple, validator=is_optional_tuple_of_str)
valid_mask
valid_mask: NDArray[np.bool_] = field(validator=is_ndarray)
weights
weights: NDArray[np.floating] | None = field(default=None, validator=is_optional_ndarray)
y
y: NDArray[np.floating] = field(validator=is_ndarray)
y_name
y_name: str = field(validator=(validators.instance_of(str)))

REInfo

Random effects metadata.

Stores information about random effect grouping structure, including the grouping variables, number of groups per variable, and indices mapping observations to groups.

Created by: build_bundle_from_data() Consumed by: build_mixed_post_fit_state(), resolve_conditional(), fit mixed-model workflows Augmented by: Never

Attributes:

NameTypeDescription
grouping_varstuple[str, ...]Tuple of grouping variable names (e.g., (“subject”, “item”)).
n_groupsdict[str, int]Dictionary mapping grouping variable to number of groups.
group_indicesdict[str, NDArray[intp]]Dictionary mapping grouping variable to group index array.
term_namestuple[str, ...]Tuple of random effect term names (e.g., ("(1
group_ids_listlist[NDArray[intp]]List of group ID arrays for each factor (for PLS fitting).
n_groups_listlist[int]List of number of groups per factor (for PLS fitting).
re_structurestrRandom effects structure type (intercept/slope/diagonal/nested/crossed).
random_nameslist[str]Names of random effect terms (e.g., [“Intercept”, “Days”]).
X_reNDArray[float64] | list[NDArray[float64]] | NoneRandom effects covariates for slope models (optional).
metadatadictDictionary with additional RE structure info for lambda_builder.

Examples:

>>> import numpy as np
>>> from data import REInfo
>>> meta = REInfo(
...     grouping_vars=["subject"],
...     n_groups={"subject": 10},
...     group_indices={"subject": np.array([0, 0, 1, 1, 2, 2])},
...     term_names=["(1|subject)"],
... )
>>> meta.grouping_vars
('subject',)

Attributes

X_re
X_re: NDArray[np.float64] | list[NDArray[np.float64]] | None = field(default=None, repr=False)
group_ids_list
group_ids_list: list[NDArray[np.intp]] = field(factory=list, repr=False)
group_indices
group_indices: dict[str, NDArray[np.intp]] = field(factory=dict, repr=False)
grouping_vars
grouping_vars: tuple[str, ...] = field(converter=to_tuple, validator=is_tuple_of_str)
metadata
metadata: dict = field(factory=dict)
n_groups
n_groups: dict[str, int] = field(factory=dict)
n_groups_list
n_groups_list: list[int] = field(factory=list)
random_names
random_names: list[str] = field(factory=list)
re_structure
re_structure: str = field(default='intercept', validator=(is_choice_str(('intercept', 'slope', 'diagonal', 'crossed', 'nested', 'mixed'))))
term_names
term_names: tuple[str, ...] = field(converter=to_tuple, factory=tuple, validator=is_tuple_of_str)

RankInfo

Rank deficiency information for a design matrix.

Stores the result of pivoted QR rank detection: which columns are numerically zero and should be dropped before fitting.

Created by: detect_rank_deficiency() Consumed by: build_bundle_from_data(), model._rank Augmented by: Never

Attributes:

NameTypeDescription
rankintEffective numerical rank of X.
pintOriginal number of columns in X.
kept_indicesNDArray[intp]Sorted indices of columns retained (length rank).
dropped_indicestuple[int, ...]Indices of columns dropped as rank-deficient.
dropped_namestuple[str, ...]Names of dropped columns.

Examples:

>>> from data import RankInfo
>>> info = RankInfo(
...     rank=2, p=3,
...     kept_indices=np.array([0, 1]),
...     dropped_indices=(2,),
...     dropped_names=("x2",),
... )
>>> info.is_deficient
True

Attributes

dropped_indices
dropped_indices: tuple[int, ...] = field(converter=tuple)
dropped_names
dropped_names: tuple[str, ...] = field(converter=tuple, validator=is_tuple_of_str)
is_deficient
is_deficient: bool

Whether the matrix is rank-deficient.

kept_indices
kept_indices: NDArray[np.intp] = field(validator=is_ndarray)
p
p: int = field(validator=is_positive_int)
rank
rank: int = field(validator=is_nonnegative_int)

Functions