Data bundle containers for model computation.
Classes:
| Name | Description |
|---|---|
DataBundle | Validated model data (valid observations only). |
REInfo | Random effects metadata. |
RankInfo | Rank deficiency information for a design matrix. |
Classes¶
DataBundle¶
Validated model data (valid observations only).
Contains design matrices, response vector, and metadata for model fitting. Only valid (non-NA) observations are included in X, y, and related arrays. The valid_mask tracks which rows from the original data were retained.
Created by: build_bundle_from_data() Consumed by: fit_model(), dispatch_marginal_computation(), dispatch_infer() Augmented by: Never
Attributes:
| Name | Type | Description |
|---|---|---|
X | NDArray[floating] | Fixed effects design matrix (n x p array). |
y | NDArray[floating] | Response vector (1D array of length n). |
X_names | tuple[str, ...] | Tuple of column names for X (length p). |
y_name | str | Name of the response variable. |
valid_mask | NDArray[bool_] | Boolean mask indicating valid rows from original data. |
n_total | int | Original row count before NA removal. |
Z | csc_matrix | None | Random effects design matrix (sparse csc_matrix, or None). |
weights | NDArray[floating] | None | Observation weights (1D array or None). |
offset | NDArray[floating] | None | Model offset (1D array or None). |
factor_levels | dict[str, tuple[str, ...]] | Dict mapping factor names to their level tuples. |
re_metadata | REInfo | None | Random effects metadata (or None for fixed-effects models). |
Examples:
>>> import numpy as np
>>> from data import DataBundle
>>> bundle = DataBundle(
... X=np.array([[1, 0], [1, 1], [1, 2]]),
... y=np.array([1.0, 2.0, 3.0]),
... X_names=["Intercept", "x"],
... y_name="y",
... valid_mask=np.array([True, True, True]),
... n_total=3,
... )
>>> bundle.n
3
>>> bundle.p
2Attributes¶
X¶
X: NDArray[np.floating] = field(validator=is_ndarray)X_names¶
X_names: tuple[str, ...] = field(converter=to_tuple, validator=is_tuple_of_str)Z¶
Z: sp.csc_matrix | None = field(default=None, validator=is_optional_sparse_csc)contrast_types¶
contrast_types: dict[str, str] = field(factory=dict)factor_levels¶
factor_levels: dict[str, tuple[str, ...]] = field(factory=dict, converter=to_tuple_of_tuples)has_random_effects¶
has_random_effects: boolWhether the bundle includes random effects.
Returns:
| Type | Description |
|---|---|
bool | True if Z matrix is present, False otherwise. |
n¶
n: intNumber of valid observations.
Returns:
| Type | Description |
|---|---|
int | Number of rows in the design matrix X. |
n_total¶
n_total: int = field(validator=is_positive_int)offset¶
offset: NDArray[np.floating] | None = field(default=None, validator=is_optional_ndarray)p¶
p: intNumber of fixed effect parameters.
Returns:
| Type | Description |
|---|---|
int | Number of columns in the design matrix X. |
rank¶
rank: intEffective rank of the fixed effects design matrix.
Returns:
| Type | Description |
|---|---|
int | Numerical rank if rank_info is available, otherwise p (full rank). |
rank_info¶
rank_info: RankInfo | None = field(default=None)re_metadata¶
re_metadata: REInfo | None = field(default=None)response_levels¶
response_levels: tuple[str, ...] | None = field(default=None, converter=to_optional_tuple, validator=is_optional_tuple_of_str)valid_mask¶
valid_mask: NDArray[np.bool_] = field(validator=is_ndarray)weights¶
weights: NDArray[np.floating] | None = field(default=None, validator=is_optional_ndarray)y¶
y: NDArray[np.floating] = field(validator=is_ndarray)y_name¶
y_name: str = field(validator=(validators.instance_of(str)))REInfo¶
Random effects metadata.
Stores information about random effect grouping structure, including the grouping variables, number of groups per variable, and indices mapping observations to groups.
Created by: build_bundle_from_data() Consumed by: build_mixed_post_fit_state(), resolve_conditional(), fit mixed-model workflows Augmented by: Never
Attributes:
| Name | Type | Description |
|---|---|---|
grouping_vars | tuple[str, ...] | Tuple of grouping variable names (e.g., (“subject”, “item”)). |
n_groups | dict[str, int] | Dictionary mapping grouping variable to number of groups. |
group_indices | dict[str, NDArray[intp]] | Dictionary mapping grouping variable to group index array. |
term_names | tuple[str, ...] | Tuple of random effect term names (e.g., ("(1 |
group_ids_list | list[NDArray[intp]] | List of group ID arrays for each factor (for PLS fitting). |
n_groups_list | list[int] | List of number of groups per factor (for PLS fitting). |
re_structure | str | Random effects structure type (intercept/slope/diagonal/nested/crossed). |
random_names | list[str] | Names of random effect terms (e.g., [“Intercept”, “Days”]). |
X_re | NDArray[float64] | list[NDArray[float64]] | None | Random effects covariates for slope models (optional). |
metadata | dict | Dictionary with additional RE structure info for lambda_builder. |
Examples:
>>> import numpy as np
>>> from data import REInfo
>>> meta = REInfo(
... grouping_vars=["subject"],
... n_groups={"subject": 10},
... group_indices={"subject": np.array([0, 0, 1, 1, 2, 2])},
... term_names=["(1|subject)"],
... )
>>> meta.grouping_vars
('subject',)Attributes¶
X_re¶
X_re: NDArray[np.float64] | list[NDArray[np.float64]] | None = field(default=None, repr=False)group_ids_list¶
group_ids_list: list[NDArray[np.intp]] = field(factory=list, repr=False)group_indices¶
group_indices: dict[str, NDArray[np.intp]] = field(factory=dict, repr=False)grouping_vars¶
grouping_vars: tuple[str, ...] = field(converter=to_tuple, validator=is_tuple_of_str)metadata¶
metadata: dict = field(factory=dict)n_groups¶
n_groups: dict[str, int] = field(factory=dict)n_groups_list¶
n_groups_list: list[int] = field(factory=list)random_names¶
random_names: list[str] = field(factory=list)re_structure¶
re_structure: str = field(default='intercept', validator=(is_choice_str(('intercept', 'slope', 'diagonal', 'crossed', 'nested', 'mixed'))))term_names¶
term_names: tuple[str, ...] = field(converter=to_tuple, factory=tuple, validator=is_tuple_of_str)RankInfo¶
Rank deficiency information for a design matrix.
Stores the result of pivoted QR rank detection: which columns are numerically zero and should be dropped before fitting.
Created by: detect_rank_deficiency() Consumed by: build_bundle_from_data(), model._rank Augmented by: Never
Attributes:
| Name | Type | Description |
|---|---|---|
rank | int | Effective numerical rank of X. |
p | int | Original number of columns in X. |
kept_indices | NDArray[intp] | Sorted indices of columns retained (length rank). |
dropped_indices | tuple[int, ...] | Indices of columns dropped as rank-deficient. |
dropped_names | tuple[str, ...] | Names of dropped columns. |
Examples:
>>> from data import RankInfo
>>> info = RankInfo(
... rank=2, p=3,
... kept_indices=np.array([0, 1]),
... dropped_indices=(2,),
... dropped_names=("x2",),
... )
>>> info.is_deficient
TrueAttributes¶
dropped_indices¶
dropped_indices: tuple[int, ...] = field(converter=tuple)dropped_names¶
dropped_names: tuple[str, ...] = field(converter=tuple, validator=is_tuple_of_str)is_deficient¶
is_deficient: boolWhether the matrix is rank-deficient.
kept_indices¶
kept_indices: NDArray[np.intp] = field(validator=is_ndarray)p¶
p: int = field(validator=is_positive_int)rank¶
rank: int = field(validator=is_nonnegative_int)