This guide covers the practical patterns and conventions for working on bossanova.
Getting Started¶
# Clone and install
git clone https://github.com/sciminds/bossanova.git
cd bossanova
pixi install
# Verify everything works
pixi run lint # ruff + ty check
pixi run test # unit tests
pixi run parity # R parity tests (requires R + lme4)All development tasks are managed through pixi. See the full task list in pyproject.toml.
Project Layout¶
bossanova/
├── internal/ # ALL implementation lives here
│ ├── containers/ # Data definitions (the "Entities")
│ │ ├── structs/ # Frozen data classes
│ │ │ ├── specs.py # ModelSpec — formula, family, link
│ │ │ ├── data.py # DataBundle — X, y, groups, metadata
│ │ │ ├── state.py # FitState, InferenceState, PredictionState
│ │ │ ├── explore.py # MeeState, ExploreFormula
│ │ │ ├── display.py # Display/rendering state
│ │ │ └── formula.py # Formula-level containers
│ │ ├── builders/ # Smart constructors for containers
│ │ ├── validators.py # is_ndarray, is_optional_ndarray
│ │ └── schemas.py # Polars DataFrame output schemas
│ │
│ ├── design/ # Design matrix coding (treatment, sum, etc.)
│ ├── formula/ # Formula parsing, design matrices
│ ├── marginal/ # EMMs, slopes, contrasts
│ ├── fit/ # Model fitting, diagnostics, convergence
│ ├── infer/ # Inference (bootstrap, permutation, CV, resample/)
│ ├── compare/ # Model comparison (LRT, AIC, F-test)
│ ├── simulation/ # Data simulation
│ ├── rendering/ # Summary display formatting
│ │
│ ├── maths/ # Pure math, backend-aware
│ │ ├── backend/ # JAX/NumPy dispatch (ArrayOps, get_ops)
│ │ ├── solvers/ # QR, IRLS, PLS, PIRLS
│ │ ├── linalg/ # QR, SVD, Cholesky, sparse ops
│ │ ├── inference/ # Statistical inference math
│ │ ├── family/ # GLM families and link functions
│ │ ├── distributions/ # Distribution utilities
│ │ └── rng.py # Backend-agnostic RNG
│ │
│ └── viz/ # Plotting implementations
│
├── model/ # User-facing model class (thin glue)
├── distributions/ # User-facing distribution factories
├── data/ # User-facing dataset loading
├── expressions.py # User-facing formula transforms
└── __init__.py # Re-exports: compare, viz, lrtThinking in ECS¶
bossanova’s development workflow is container-first: define the data shape before writing logic.
The workflow¶
DEFINE the container → What fields? What types? What invariants?
IMPLEMENT the operation → Accept containers in, return containers out
WIRE into model class → Thin glue: check → call → store → return self
Never skip step 1. Never start at step 3.
Containers¶
All containers are frozen attrs classes with validators on every field:
from attrs import frozen, field, validators
from bossanova.internal.containers.validators import is_ndarray, is_optional_ndarray
@frozen
class MeeState:
"""Marginal effects / EMM results.
Created by: compute_emm(), compute_slopes()
Consumed by: model.effects property, compute_mee_inference()
Augmented by: attrs.evolve() after inference adds SEs/CIs
"""
grid: pl.DataFrame = field(repr=False)
estimate: np.ndarray = field(validator=is_ndarray)
type: str = field(validator=validators.in_(("means", "slopes", "contrasts")))
# Optional — added via evolve() after inference
se: np.ndarray | None = field(default=None, validator=is_optional_ndarray)Rules:
Always
@frozen— containers are immutable. New state =attrs.evolve(old, field=value).Validators on every field —
is_ndarrayfor required arrays,is_optional_ndarrayfor optional,validators.in_()for enums.Document lifecycle — the docstring says who creates, consumes, and augments the container.
Operations¶
Operations are pure functions: containers in, containers out.
def compute_emm(
spec: ModelSpec,
bundle: DataBundle,
fit: FitState,
parsed: ExploreFormula,
) -> MeeState:
"""Compute estimated marginal means."""
grid = build_reference_grid(bundle, parsed)
estimates = grid_predictions(spec, bundle, fit, grid)
return MeeState(grid=grid, estimate=estimates, type="means", ...)Never accept the model class as a parameter. Never produce side effects.
Model class as glue¶
The model orchestrates — it does not implement:
def explore(self, formula: str, **kwargs) -> Self:
self._check_fitted()
parsed = parse_explore_formula(formula) # internal/formula/
self._mee = compute_emm( # internal/marginal/
self._spec, self._bundle, self._fit,
parsed, **kwargs,
)
return selfTest for glue: strip all logic from the method. If what remains is check → call → store → return self, it’s glue. If it has loops, conditionals on data, or array operations, extract to internal/.
Key Patterns¶
Backend dispatch¶
Two patterns for handling JAX/NumPy duality:
Pattern A — Early dispatch for algorithms requiring JAX control flow:
def fit_glm_irls(...):
backend = get_backend()
if backend == "jax":
return _fit_glm_irls_jax(...) # lax.while_loop
else:
return _fit_glm_irls_numpy(...) # Python while loopPattern B — Polymorphic ops for semantically identical algorithms:
def compute_leverage(X: np.ndarray) -> np.ndarray:
ops = get_ops()
Q, _ = ops.qr(X)
return ops.np.sum(Q**2, axis=1)Use Pattern A when control flow genuinely differs. Use Pattern B everywhere else.
JIT caching¶
Cache JIT-compiled functions per backend to avoid recompilation:
_cache: dict[str, Any] = {}
def _make_fn(ops):
def _core(X, y):
Q, R = ops.qr(X)
return ops.np.linalg.solve(R, Q.T @ y)
return ops.jit(_core)
def _get_fn():
backend = get_backend()
if backend not in _cache:
_cache[backend] = _make_fn(get_ops())
return _cache[backend]Augmentation via evolve¶
Containers grow through attrs.evolve(), not mutation:
from attrs import evolve
# After fitting
fit = fit_model(spec, bundle)
# After inference — evolve to add SEs, CIs
mee = compute_emm(spec, bundle, fit, parsed)
mee = evolve(mee, se=ses, ci_lower=ci_lo, ci_upper=ci_hi)Polars, not pandas¶
bossanova is Polars-native internally. Pandas is accepted at the API boundary and converted immediately:
def __init__(self, data: pl.DataFrame | PandasDataFrame):
if not isinstance(data, pl.DataFrame):
data = pl.from_pandas(data)
self._data = data # Always Polars internallyWhere to Find Things¶
Before writing new code, check what already exists:
| Need | Location | Examples |
|---|---|---|
| Formula parsing | internal/formula/ | parse_explore_formula(), build_design_matrix() |
| Marginal effects | internal/marginal/ | compute_emm(), compute_slopes() |
| Model fitting | internal/fit/ | fit_model(), compute_diagnostics() |
| Inference & resampling | internal/infer/ | dispatch_infer(), bootstrap_inference() |
| Model comparison | internal/compare/ | compare_models(), compute_lrt() |
| Design matrix coding | internal/design/ | treatment_coding(), sum_coding() |
| Linear algebra | internal/maths/linalg/ | QR, SVD, Cholesky |
| Solvers | internal/maths/solvers/ | QR solve (LM), IRLS (GLM), PLS/PIRLS (mixed) |
| Statistical inference | internal/maths/inference/ | SEs, CIs, p-values, Satterthwaite df |
| GLM families & links | internal/maths/family/ | Gaussian, Binomial, Poisson, link functions |
| Backend dispatch | internal/maths/backend/ | get_ops(), get_backend(), ArrayOps |
| RNG abstraction | internal/maths/rng.py | RNG.from_seed(), .split(), .normal() |
| DataFrame schemas | internal/containers/schemas.py | ParamsAsymp, DiagnosticsSchema |
| Container validators | internal/containers/validators.py | is_ndarray, is_optional_ndarray |
Testing¶
Running tests¶
# Development workflow
pixi run lint # Lint + format + type check (run first)
pixi run test # Fast unit tests (default — skips @pytest.mark.slow)
pixi run test full # All unit tests including slow
pixi run test slow # Only slow tests
# Module-specific
pixi run test-module specs # Container classes
pixi run test-module distributions # Distribution module
pixi run test-model # Unified model class
pixi run test-module linalg # Linear algebra
pixi run test-module viz # Visualization
pixi run test-module stats # Statistics
pixi run test-module data # Data loading
# R parity
pixi run parity # All fast R parity tests
pixi run parity slow # All slow parity tests
pixi run parity lm # LM vs R
pixi run parity glm # GLM vs R
pixi run parity lmer # LMER vs R
pixi run parity glmer # GLMER vs R
pixi run parity emmeans # EMMs vs R
pixi run parity compare # Compare vs R
# Property-based
pixi run hypothesis # All Hypothesis tests
# Environment
pixi run test pyodide # WASM/browser compatibilityWhat each layer covers¶
| Layer | Target | Value |
|---|---|---|
| Unit tests | internal/ domain modules, internal/maths/ | Logic correctness — pure functions with clear I/O |
| Parity tests | Full pipeline vs R/lme4 | Numerical correctness — coefficients, SEs, CIs match R |
| Hypothesis tests | Mathematical invariants | Property correctness — theorems hold for all valid input |
| Redteam tests | Edge cases, pathological input | Robustness — graceful failures, informative errors |
| Pyodide tests | Browser environment | Portability — NumPy backend works without JAX |
Writing parity tests¶
Parity tests compare bossanova output against R reference values. The pattern:
Generate reference values in R (stored as fixtures or computed in-test via
Rscript)Fit the same model in bossanova
Assert numerical agreement within tolerance (~1e-6 for coefficients, ~1e-4 for p-values)
When bossanova and R disagree, investigate: is it a bug, a difference in defaults, or a methodological choice? Document the finding either way.
Code Quality Checklist¶
Before marking any implementation work as done:
pixi run lintpasses (ruff + ty check)pixi run testpasses (all unit tests)Every new function has full type annotations (params + return)
Every new function has a Google-style docstring (Args, Returns at minimum)
Every new container field has a validator
No new file exceeds 1200 lines (800 preferred; 800–1200 only for cohesive units) (
wc -l path/to/file.py)No implementation logic added to the model class
__all__updated in affected modules
Conventions¶
Type hints¶
Use modern Python 3.10+ syntax:
# Correct
def fit(self, weights: str | None = None) -> Self: ...
# Avoid
def fit(self, weights: Optional[str] = None) -> "Model": ...Every function in internal/ must have full type annotations — all parameters and return type.
Docstrings (Google style)¶
def compute_emm(
spec: ModelSpec,
bundle: DataBundle,
fit: FitState,
parsed: ExploreFormula,
) -> MeeState:
"""Compute estimated marginal means.
Args:
spec: Model configuration (family, link, formula).
bundle: Validated model data (X, y, factor levels).
fit: Fitted model state (coefficients, vcov).
parsed: Parsed explore formula (focal variable, conditions).
Returns:
MeeState with grid, estimates, and metadata.
"""Error handling¶
Validate at system boundaries (container constructors, model methods)
Trust internal callers in
maths/functionsUse informative messages: “Variable ‘x’ not found in data. Available: ...”
Broad
except Exceptiononly in resampling loops (return NaN sentinel)
Naming¶
Module files:
lowercase_with_underscores.pyTest files:
test_prefixHypothesis tests:
*_hypothesis.pysuffixNo
_prefix for standalone functions ininternal/— the module boundary communicates privacy_prefix for model class helper methods (model._check_fitted())
File size¶
Target 400–800 lines per file. Up to 1200 is acceptable for cohesive units (e.g., all EMM computation in one file). Over 1200 always requires a split. When a file approaches the limit:
Identify natural split points (distinct concerns, helper clusters)
Split into focused modules
Use
__init__.pyre-exports to maintain import paths
Module hygiene¶
Every module declares
__all__Imports are ordered: stdlib, third-party, internal, conditional (
TYPE_CHECKING)In files over 300 lines, use section markers for navigation:
# =============================================================================
# Core Functions
# =============================================================================JAX imports¶
Never import JAX unconditionally — it breaks Pyodide:
# Correct: conditional import with fallback
try:
import jax
import jax.numpy as jnp
except ImportError:
import numpy as jnpUse get_ops() from internal/maths/backend/ for array operations. Use RNG from internal/maths/rng instead of direct JAX/NumPy random calls.