All container classes, builder functions, and validators from bossanova.internal.containers. See Container Overview for the pipeline diagram.
Container Classes¶
DataBundle¶
Validated model data (valid observations only)
| Field | Type |
|---|---|
X | NDArray[np.floating] |
X_names | tuple[str, ...] |
Z | sp.csc_matrix | None |
contrast_types | dict[str, str] |
factor_levels | dict[str, tuple[str, ...]] |
has_random_effects | bool |
n | int |
n_total | int |
offset | NDArray[np.floating] | None |
p | int |
rank | int |
rank_info | RankInfo | None |
re_metadata | REInfo | None |
response_levels | tuple[str, ...] | None |
valid_mask | NDArray[np.bool_] |
weights | NDArray[np.floating] | None |
y | NDArray[np.floating] |
y_name | str |
REInfo¶
Random effects metadata
| Field | Type |
|---|---|
X_re | NDArray[np.float64] | list[NDArray[np.float64]] | None |
group_ids_list | list[NDArray[np.intp]] |
group_indices | dict[str, NDArray[np.intp]] |
grouping_vars | tuple[str, ...] |
metadata | dict |
n_groups | dict[str, int] |
n_groups_list | list[int] |
random_names | list[str] |
re_structure | str |
term_names | tuple[str, ...] |
RankInfo¶
Rank deficiency information for a design matrix
| Field | Type |
|---|---|
dropped_indices | tuple[int, ...] |
dropped_names | tuple[str, ...] |
is_deficient | bool |
kept_indices | NDArray[np.intp] |
p | int |
rank | int |
MathDisplay¶
Display wrapper for model equation with IPython rich display
| Field | Type |
|---|---|
equation | str |
explanations | tuple[str, ...] |
TermInfo¶
Parsed information about a single model term
| Field | Type |
|---|---|
base_var | str |
explanation | str |
latex | str |
name | str |
symbol | str |
term_type | str |
Condition¶
A conditioning specification in explore formula
| Field | Type |
|---|---|
at_quantile | int | None |
at_range | int | None |
at_values | tuple | None |
contrast_expr | ContrastExpr | None |
var | str |
ContrastExpr¶
Bracket contrast expression: Drug[A - B, C - D]
| Field | Type |
|---|---|
items | tuple[ContrastItem, ...] |
var | str |
ContrastItem¶
A single contrast: left operand minus right operand
| Field | Type |
|---|---|
left | ContrastOperand |
right | ContrastOperand |
ContrastOperand¶
One side of a bracket contrast item
| Field | Type |
|---|---|
is_wildcard | bool |
levels | tuple[str, ...] |
ExploreFormulaError¶
Error in explore formula syntax
| Field | Type |
|---|---|
formula | `` |
position | `` |
ExploreFormulaSpec¶
Parsed explore formula
| Field | Type |
|---|---|
conditions | tuple[Condition, ...] |
contrast_degree | int | None |
contrast_expr | ContrastExpr | None |
contrast_level_ordering | tuple[str, ...] | None |
contrast_ref | str | None |
contrast_type | str | None |
focal_at_quantile | int | None |
focal_at_range | int | None |
focal_at_values | tuple[float | str, ...] | None |
focal_var | str |
has_conditions | bool |
has_contrast | bool |
has_contrast_expr | bool |
has_rhs_contrasts | bool |
FitState¶
Immutable fitting result
| Field | Type |
|---|---|
XtWX_inv | NDArray[np.floating] | None |
coef | NDArray[np.floating] |
converged | bool |
deviance | float | None |
df_resid | float |
dispersion | float | None |
fitted | NDArray[np.floating] |
irls_weights | NDArray[np.floating] | None |
leverage | NDArray[np.floating] |
loglik | float |
n_iter | int |
null_deviance | float | None |
residuals | NDArray[np.floating] |
sigma | float | None |
theta | NDArray[np.floating] | None |
u | NDArray[np.floating] | None |
vcov | NDArray[np.floating] |
FormulaSpec¶
Learned formula encoding — everything needed to replay on new data
| Field | Type |
|---|---|
contrast_matrices | dict[str, NDArray] |
contrast_types | dict[str, str] |
custom_contrasts | dict[str, NDArray] |
factors | dict[str, tuple[str, ...]] |
formula | str |
has_intercept | bool |
has_random_effects | bool |
nested_metadata | dict |
re_terms | tuple |
response_transform | tuple[str, ...] | None |
response_var | str | None |
rhs_terms | tuple |
transform_state | dict[str, dict] |
transforms | dict[str, object] |
uncorr_metadata | dict |
CVState¶
Cross-validation results for model evaluation
| Field | Type |
|---|---|
accuracy | float | None |
auc | float | None |
deviance | float | None |
f1 | float | None |
fold_assignments | np.ndarray | None |
fold_metrics | dict[str, np.ndarray] |
k | int |
mae | float |
oos_predictions | np.ndarray | None |
oos_residuals | np.ndarray | None |
r_squared | float |
rmse | float |
sensitivity | float | None |
specificity | float | None |
InferenceState¶
Inference results that augment params or estimates
| Field | Type |
|---|---|
alternative | str |
boot_samples | np.ndarray | None |
ci_lower | np.ndarray |
ci_upper | np.ndarray |
conf_level | float |
df | np.ndarray |
method | str |
n_resamples | int | None |
null | float |
p_value | np.ndarray |
perm_samples | np.ndarray | None |
pre | np.ndarray | None |
pre_sd | np.ndarray | None |
se | np.ndarray |
statistic | np.ndarray |
JointTestState¶
Joint hypothesis test results for model terms
| Field | Type |
|---|---|
df1 | np.ndarray |
df2 | np.ndarray | None |
p_value | np.ndarray |
ss_type | str |
statistic | np.ndarray |
terms | tuple[str, ...] |
test_type | str |
ResamplesState¶
Unified resampling results from bootstrap or permutation inference
| Field | Type |
|---|---|
context | str |
method | str |
n_resamples | int |
names | tuple[str, ...] |
observed | np.ndarray |
samples | np.ndarray |
MeeState¶
Marginal effects / estimated marginal means results
| Field | Type |
|---|---|
L_matrix | np.ndarray | None |
L_matrix_link | np.ndarray | None |
ci_lower | np.ndarray | None |
ci_upper | np.ndarray | None |
conf_level | float | None |
contrast_method | str | None |
df | np.ndarray | None |
effect_scale | str |
estimate | np.ndarray |
explore_formula | str |
focal_var | str |
grid | 'pl.DataFrame' |
has_inference | bool |
how | str |
inference_method | str | None |
link | str | None |
n_contrast_levels | int | None |
p_value | np.ndarray | None |
se | np.ndarray | None |
statistic | np.ndarray | None |
type | str |
ProfileState¶
Profile likelihood state for variance component CIs
| Field | Type |
|---|---|
ci_lower_sd | NDArray[np.floating] |
ci_sd | dict[str, tuple[float, float]] |
ci_theta | dict[str, tuple[float, float]] |
ci_upper_sd | NDArray[np.floating] |
conf_level | float |
dev_opt | float |
spline_forward | dict[str, Any] |
spline_reverse | dict[str, Any] |
table | 'pl.DataFrame' |
threshold | float |
VaryingSpreadState¶
Variance components for mixed models
| Field | Type |
|---|---|
ci_lower | dict[str, float] | None |
ci_method | str | None |
ci_upper | dict[str, float] | None |
components | 'pl.DataFrame' |
conf_level | float | None |
has_inference | bool |
icc | float | None |
rho | dict[str, float] |
sigma2 | float |
tau2 | dict[str, float] |
VaryingState¶
Random effects (BLUPs) for mixed models
| Field | Type |
|---|---|
conf_level | float | None |
effects | dict[str, np.ndarray] |
grid | 'pl.DataFrame' |
grouping_var | str |
has_inference | bool |
n_groups | int |
pi_lower | dict[str, np.ndarray] | None |
pi_upper | dict[str, np.ndarray] | None |
PredictionConfig¶
Configuration that produced a PredictionState
| Field | Type |
|---|---|
allow_new_levels | bool |
formula_spec | Any |
newdata | 'pl.DataFrame | None' |
pred_type | str |
training_data | 'pl.DataFrame | None' |
varying | str |
PredictionState¶
Prediction results with optional intervals
| Field | Type |
|---|---|
X_pred | np.ndarray | None |
ci_lower | np.ndarray | None |
ci_upper | np.ndarray | None |
conf_level | float | None |
config | PredictionConfig | None |
cv_fitted | np.ndarray | None |
cv_fold | np.ndarray | None |
cv_residual | np.ndarray | None |
fitted | np.ndarray |
grid | 'pl.DataFrame | None' |
has_cv | bool |
has_inference | bool |
interval_type | str | None |
link | np.ndarray | None |
se | np.ndarray | None |
SimulationInferenceState¶
Simulation inference results for post-fit or power analysis simulations
| Field | Type |
|---|---|
alpha | float |
bias | dict[str, float] |
coverage | dict[str, float] |
n_sims | int |
power | dict[str, float] |
rmse | dict[str, float] |
sim_mean | np.ndarray | None |
sim_quantiles | dict[str, np.ndarray] |
sim_sd | np.ndarray | None |
sim_type | str |
true_coef | dict[str, float] |
Distribution¶
Protocol for distribution objects that can generate random samples
ModelSpec¶
Immutable model configuration
| Field | Type |
|---|---|
family | str |
fixed_terms | tuple[str, ...] |
formula | str |
has_random_effects | bool |
link | str |
method | str |
random_terms | tuple[str, ...] |
response_var | str |
SimulationSpec¶
Specification for data generation in simulation-first workflows
| Field | Type |
|---|---|
coef | dict[str, float] |
distributions | dict[str, Distribution] |
n | int |
re_spec | dict[str, VaryingSpec] |
seed | int | None |
sigma | float |
VaryingSpec¶
Specification for random effect grouping in simulation
| Field | Type |
|---|---|
correlations | dict[tuple[str, str], float] |
n | int |
n_per | int | None |
sd | float |
slope_sds | dict[str, float] |
Col¶
Namespace for all column name constants used across DataFrame outputs
| Field | Type |
|---|---|
AIC | str |
AIC_R | str |
BIAS | str |
BIC | str |
BIC_R | str |
CHI2 | str |
CHISQ | str |
CI_INCREASE_FACTOR | str |
CI_LOWER | str |
CI_METHOD | str |
CI_UPPER | str |
COHENS_D | str |
COMPONENT | str |
CONTRAST | str |
CONVERGED | str |
COOKSD | str |
CORR | str |
COVERAGE | str |
CV_DEVIANCE | str |
CV_FITTED | str |
CV_FOLD | str |
CV_K | str |
CV_MAE | str |
CV_MAE_SD | str |
CV_RESIDUAL | str |
CV_RMSE | str |
CV_RMSE_SD | str |
CV_RSQUARED | str |
CV_RSQUARED_SD | str |
CV_SCORE | str |
CV_SE | str |
DELTA_AIC | str |
DELTA_BIC | str |
DEVIANCE | str |
DEV_DIFF | str |
DF | str |
DF1 | str |
DF2 | str |
DF_MODEL | str |
DF_RESID | str |
DIFF | str |
DIFF_SE | str |
DISPERSION | str |
D_LOWER | str |
D_UPPER | str |
EFFECT1 | str |
EFFECT2 | str |
EMPIRICAL_SE | str |
ESTIMATE | str |
ETA_SQ | str |
FITTED | str |
FSTATISTIC | str |
FSTATISTIC_PVALUE | str |
F_RATIO | str |
F_STAT | str |
GROUP | str |
HAT | str |
ICC | str |
IS_SINGULAR | str |
LEVEL | str |
LINK | str |
LOGLIK | str |
MEAN_SE | str |
MODEL | str |
N | str |
NGROUPS | str |
NOBS | str |
NOBS_MISSING | str |
NOBS_TOTAL | str |
NPAR | str |
NPARAMS | str |
NULL_DEVIANCE | str |
N_FAILED | str |
N_ITER | str |
N_SIMS | str |
N_THETA | str |
OBSERVED | str |
ODDS_RATIO | str |
OPTIMIZER | str |
PI_LOWER_PREFIX | str |
PI_UPPER_PREFIX | str |
POWER | str |
POWER_CI_LOWER | str |
POWER_CI_UPPER | str |
PRE | str |
PRE_R | str |
PRE_SD | str |
PSEUDO_RSQUARED | str |
P_VALUE | str |
RESAMPLE | str |
RESID | str |
RHO_PREFIX | str |
RHS_CONTRAST | str |
RMSE | str |
RSQUARED | str |
RSQUARED_ADJ | str |
RSQUARED_CONDITIONAL | str |
RSQUARED_MARGINAL | str |
RSS | str |
R_SEMI | str |
SE | str |
SIGMA | str |
SIGMA2 | str |
SIM_MEAN | str |
SIM_Q025 | str |
SIM_Q975 | str |
SIM_SD | str |
SS | str |
STATISTIC | str |
STD_RESID | str |
TAU2_PREFIX | str |
TERM | str |
TERM_TYPE | str |
TRUE_VALUE | str |
T_STAT | str |
VALUE | str |
VIF | str |
WEIGHT | str |
Builder Functions¶
| Function | Signature | Description | Module |
|---|---|---|---|
build_cv_state | (k, rmse, mae, r_squared, deviance, accuracy, sensitivity, specificity, f1, auc, fold_metrics, oos_predictions, oos_residuals, fold_assignments) -> CVState | Build a CVState from cross-validation computation | builders |
build_effects_dataframe | (mee, method) -> pl.DataFrame | Build the .effects DataFrame from marginal effects state | builders |
build_fit_state | (coef, vcov, fitted, residuals, leverage, df_resid, loglik, converged, n_iter, sigma, dispersion, null_deviance, deviance, theta, u, irls_weights, XtWX_inv) -> FitState | Build a FitState instance with validation | builders |
build_inference_state | (se, statistic, df, p_value, ci_lower, ci_upper, conf_level, method, null, alternative, n_resamples, boot_samples, perm_samples, pre, pre_sd) -> InferenceState | Build an InferenceState from computed inference values | builders |
build_joint_test_dataframe | (state) -> pl.DataFrame | Build an ANOVA-style DataFrame from joint test results | builders |
build_joint_test_state | (terms, df1, statistic, p_value, test_type, ss_type, df2) -> JointTestState | Build a JointTestState from computed joint test values | builders |
build_mee_resamples | (mee, samples, how) -> ResamplesState | None | Build ResamplesState from MEE inference if samples are available | builders |
build_mee_state | (grid, estimate, explore_formula, focal_var, mee_type, how, effect_scale, L_matrix, contrast_method, n_contrast_levels, link, L_matrix_link, boot_X_plus, boot_X_minus, boot_delta, se, df, statistic, p_value, ci_lower, ci_upper, conf_level) -> MeeState | Build a MeeState from marginal effects computation | builders |
build_model_spec | (formula, family, link, method, response_var, fixed_terms, random_terms, has_random_effects) -> ModelSpec | Build a ModelSpec from raw inputs | builders |
build_model_spec_from_formula | (formula, family, link, method, structure) -> ModelSpec | Build ModelSpec from a pre-parsed formula structure and resolve defaults | builders |
build_params_dataframe | (bundle, fit, params_inference) -> pl.DataFrame | Build the .params DataFrame from fit state | builders |
build_params_resamples | (inference, fit_coef, x_names, how) -> ResamplesState | None | Build ResamplesState from params inference if samples are available | builders |
build_prediction_state | (fitted, link, X_pred, config, se, ci_lower, ci_upper, interval_type, conf_level, grid) -> PredictionState | Build a PredictionState from prediction computation | builders |
build_predictions_dataframe | (pred) -> pl.DataFrame | Build the .predictions DataFrame from prediction state | builders |
build_resamples_dataframe | (rs) -> pl.DataFrame | Build a long-format DataFrame of raw resampled values | builders |
build_resamples_state | (samples, observed, names, method, n_resamples, context) -> ResamplesState | Build a ResamplesState from resampling results | builders |
build_simulation_inference_state | (sim_type, n_sims, sim_mean, sim_sd, sim_quantiles, power, coverage, bias, rmse, alpha, true_coef) -> SimulationInferenceState | Build a SimulationInferenceState from computed values | builders |
build_simulation_spec | (n, distributions, coef, sigma, re_spec, seed) -> SimulationSpec | Build a SimulationSpec for data generation | builders |
build_simulation_spec_from_formula | (formula, n, distributions, coef, sigma, seed) -> SimulationSpec | Build SimulationSpec from formula with defaults for unspecified variables | builders |
build_simulations_dataframe | (simulations, sim_inference) -> pl.DataFrame | Build the .simulations DataFrame with optional inference columns | builders |
build_varying_corr_dataframe | (varying_spread) -> pl.DataFrame | Build the .varying_corr DataFrame from random effect correlations | builders |
build_varying_offsets_dataframe | (varying_offsets) -> pl.DataFrame | Build the .varying_offsets DataFrame from varying state | builders |
build_varying_params_dataframe | (bundle, fit, varying_offsets) -> pl.DataFrame | Build the .varying_params DataFrame (population + offsets) | builders |
build_varying_spec | (n, sd, slope_sds, correlations, n_per) -> VaryingSpec | Build a VaryingSpec for random effect structure | builders |
build_varying_spread_dataframe | (varying_spread) -> pl.DataFrame | Build the .varying_spread DataFrame from variance components | builders |
build_varying_spread_state | (components, sigma2, tau2, rho, icc, ci_lower, ci_upper, conf_level, ci_method) -> VaryingSpreadState | Build a VaryingSpreadState from variance component estimates | builders |
build_varying_state | (grid, effects, grouping_var, n_groups, pi_lower, pi_upper, conf_level) -> VaryingState | Build a VaryingState from computed BLUPs | builders |
Validators¶
| Function | Signature | Description |
|---|---|---|
is_choice_str | (choices) -> ... | Build a validator for a string constrained to a fixed set of choices |
is_conf_level | (instance, attribute, value) -> None | Validate that a value is a normalized confidence level in (0, 1) |
is_ndarray | (instance, attribute, value) -> None | Validate that a value is a numpy ndarray |
is_nonnegative_int | (instance, attribute, value) -> None | Validate that a value is a non-negative integer |
is_optional_conf_level | (instance, attribute, value) -> None | Validate that a value is a normalized confidence level or None |
is_optional_int | (instance, attribute, value) -> None | Validate that a value is an int or None |
is_optional_ndarray | (instance, attribute, value) -> None | Validate that a value is a numpy ndarray or None |
is_optional_positive_int | (instance, attribute, value) -> None | Validate that a value is a positive integer or None |
is_optional_sparse_csc | (instance, attribute, value) -> None | Validate that a value is a scipy.sparse.csc_matrix or None |
is_optional_str | (instance, attribute, value) -> None | Validate that a value is a string or None |
is_optional_str_key_dict | (instance, attribute, value) -> None | Validate that a value is a dict with string keys or None |
is_optional_tuple_of_str | (instance, attribute, value) -> None | Validate that a value is a tuple of strings or None |
is_positive_int | (instance, attribute, value) -> None | Validate that a value is a positive integer |
is_tuple_of_str | (instance, attribute, value) -> None | Validate that a value is a tuple of strings |
normalize_conf_level | (conf_level) -> float | Normalize conf_level to a float in (0, 1) |
normalize_optional_conf_level | (conf_level) -> float | None | Normalize an optional confidence level |
validate_correlations | (instance, attribute, value) -> None | Validate correlation values are in [-1, 1] |
validate_sigma | (instance, attribute, value) -> None | Validate sigma is non-negative |
validate_slope_sds | (instance, attribute, value) -> None | Validate slope SDs are non-negative |