Developer Guide - bossanova

This guide covers the practical patterns and conventions for working on bossanova.

Getting Started¶

# Clone and install
git clone https://github.com/sciminds/bossanova.git
cd bossanova
pixi install

# Verify everything works
pixi run lint          # ruff + ty check
pixi run test          # unit tests
pixi run parity        # R parity tests (requires R + lme4)

All development tasks are managed through pixi. See the full task list in pyproject.toml.

Project Layout¶

bossanova/
├── internal/                      # ALL implementation lives here
│   ├── containers/                #   Data definitions (the "Entities")
│   │   ├── structs/               #     Frozen data classes
│   │   │   ├── specs.py           #       ModelSpec — formula, family, link
│   │   │   ├── data.py            #       DataBundle — X, y, groups, metadata
│   │   │   ├── state.py           #       FitState, InferenceState, PredictionState
│   │   │   ├── explore.py         #       MeeState, ExploreFormula
│   │   │   ├── display.py         #       Display/rendering state
│   │   │   └── formula.py         #       Formula-level containers
│   │   ├── builders/              #     Smart constructors for containers
│   │   ├── validators.py          #     is_ndarray, is_optional_ndarray
│   │   └── schemas.py             #     Polars DataFrame output schemas
│   │
│   ├── design/                    #   Design matrix coding (treatment, sum, etc.)
│   ├── formula/                   #   Formula parsing, design matrices
│   ├── marginal/                  #   EMMs, slopes, contrasts
│   ├── fit/                       #   Model fitting, diagnostics, convergence
│   ├── infer/                     #   Inference (bootstrap, permutation, CV, resample/)
│   ├── compare/                   #   Model comparison (LRT, AIC, F-test)
│   ├── simulation/                #   Data simulation
│   ├── rendering/                 #   Summary display formatting
│   │
│   ├── maths/                     #   Pure math, backend-aware
│   │   ├── backend/               #     JAX/NumPy dispatch (ArrayOps, get_ops)
│   │   ├── solvers/               #     QR, IRLS, PLS, PIRLS
│   │   ├── linalg/                #     QR, SVD, Cholesky, sparse ops
│   │   ├── inference/             #     Statistical inference math
│   │   ├── family/                #     GLM families and link functions
│   │   ├── distributions/         #     Distribution utilities
│   │   └── rng.py                 #     Backend-agnostic RNG
│   │
│   └── viz/                       #   Plotting implementations
│
├── model/                         # User-facing model class (thin glue)
├── distributions/                 # User-facing distribution factories
├── data/                          # User-facing dataset loading
├── expressions.py                 # User-facing formula transforms
└── __init__.py                    # Re-exports: compare, viz, lrt

Thinking in ECS¶

bossanova’s development workflow is container-first: define the data shape before writing logic.

The workflow¶

DEFINE the container → What fields? What types? What invariants?
IMPLEMENT the operation → Accept containers in, return containers out
WIRE into model class → Thin glue: check → call → store → return self

Never skip step 1. Never start at step 3.

Containers¶

All containers are frozen attrs classes with validators on every field:

from attrs import frozen, field, validators
from bossanova.internal.containers.validators import is_ndarray, is_optional_ndarray

@frozen
class MeeState:
    """Marginal effects / EMM results.

    Created by: compute_emm(), compute_slopes()
    Consumed by: model.effects property, compute_mee_inference()
    Augmented by: attrs.evolve() after inference adds SEs/CIs
    """

    grid: pl.DataFrame = field(repr=False)
    estimate: np.ndarray = field(validator=is_ndarray)
    type: str = field(validator=validators.in_(("means", "slopes", "contrasts")))

    # Optional — added via evolve() after inference
    se: np.ndarray | None = field(default=None, validator=is_optional_ndarray)

Rules:

Always @frozen — containers are immutable. New state = attrs.evolve(old, field=value).
Validators on every field — is_ndarray for required arrays, is_optional_ndarray for optional, validators.in_() for enums.
Document lifecycle — the docstring says who creates, consumes, and augments the container.

Operations¶

Operations are pure functions: containers in, containers out.

def compute_emm(
    spec: ModelSpec,
    bundle: DataBundle,
    fit: FitState,
    parsed: ExploreFormula,
) -> MeeState:
    """Compute estimated marginal means."""
    grid = build_reference_grid(bundle, parsed)
    estimates = grid_predictions(spec, bundle, fit, grid)
    return MeeState(grid=grid, estimate=estimates, type="means", ...)

Never accept the model class as a parameter. Never produce side effects.

Model class as glue¶

The model orchestrates — it does not implement:

def explore(self, formula: str, **kwargs) -> Self:
    self._check_fitted()
    parsed = parse_explore_formula(formula)      # internal/formula/
    self._mee = compute_emm(                     # internal/marginal/
        self._spec, self._bundle, self._fit,
        parsed, **kwargs,
    )
    return self

Test for glue: strip all logic from the method. If what remains is check → call → store → return self, it’s glue. If it has loops, conditionals on data, or array operations, extract to internal/.

Key Patterns¶

Backend dispatch¶

Two patterns for handling JAX/NumPy duality:

Pattern A — Early dispatch for algorithms requiring JAX control flow:

def fit_glm_irls(...):
    backend = get_backend()
    if backend == "jax":
        return _fit_glm_irls_jax(...)   # lax.while_loop
    else:
        return _fit_glm_irls_numpy(...)  # Python while loop

Pattern B — Polymorphic ops for semantically identical algorithms:

def compute_leverage(X: np.ndarray) -> np.ndarray:
    ops = get_ops()
    Q, _ = ops.qr(X)
    return ops.np.sum(Q**2, axis=1)

Use Pattern A when control flow genuinely differs. Use Pattern B everywhere else.

JIT caching¶

Cache JIT-compiled functions per backend to avoid recompilation:

_cache: dict[str, Any] = {}

def _make_fn(ops):
    def _core(X, y):
        Q, R = ops.qr(X)
        return ops.np.linalg.solve(R, Q.T @ y)
    return ops.jit(_core)

def _get_fn():
    backend = get_backend()
    if backend not in _cache:
        _cache[backend] = _make_fn(get_ops())
    return _cache[backend]

Augmentation via evolve¶

Containers grow through attrs.evolve(), not mutation:

from attrs import evolve

# After fitting
fit = fit_model(spec, bundle)

# After inference — evolve to add SEs, CIs
mee = compute_emm(spec, bundle, fit, parsed)
mee = evolve(mee, se=ses, ci_lower=ci_lo, ci_upper=ci_hi)

Polars, not pandas¶

bossanova is Polars-native internally. Pandas is accepted at the API boundary and converted immediately:

def __init__(self, data: pl.DataFrame | PandasDataFrame):
    if not isinstance(data, pl.DataFrame):
        data = pl.from_pandas(data)
    self._data = data  # Always Polars internally

Where to Find Things¶

Before writing new code, check what already exists:

Need	Location	Examples
Formula parsing	`internal/formula/`	`parse_explore_formula()`, `build_design_matrix()`
Marginal effects	`internal/marginal/`	`compute_emm()`, `compute_slopes()`
Model fitting	`internal/fit/`	`fit_model()`, `compute_diagnostics()`
Inference & resampling	`internal/infer/`	`dispatch_infer()`, `bootstrap_inference()`
Model comparison	`internal/compare/`	`compare_models()`, `compute_lrt()`
Design matrix coding	`internal/design/`	`treatment_coding()`, `sum_coding()`
Linear algebra	`internal/maths/linalg/`	QR, SVD, Cholesky
Solvers	`internal/maths/solvers/`	QR solve (LM), IRLS (GLM), PLS/PIRLS (mixed)
Statistical inference	`internal/maths/inference/`	SEs, CIs, p-values, Satterthwaite df
GLM families & links	`internal/maths/family/`	Gaussian, Binomial, Poisson, link functions
Backend dispatch	`internal/maths/backend/`	`get_ops()`, `get_backend()`, `ArrayOps`
RNG abstraction	`internal/maths/rng.py`	`RNG.from_seed()`, `.split()`, `.normal()`
DataFrame schemas	`internal/containers/schemas.py`	`ParamsAsymp`, `DiagnosticsSchema`
Container validators	`internal/containers/validators.py`	`is_ndarray`, `is_optional_ndarray`

Testing¶

Running tests¶

# Development workflow
pixi run lint              # Lint + format + type check (run first)
pixi run test              # Fast unit tests (default — skips @pytest.mark.slow)
pixi run test full         # All unit tests including slow
pixi run test slow         # Only slow tests

# Module-specific
pixi run test-module specs         # Container classes
pixi run test-module distributions # Distribution module
pixi run test-model                # Unified model class
pixi run test-module linalg        # Linear algebra
pixi run test-module viz           # Visualization
pixi run test-module stats         # Statistics
pixi run test-module data          # Data loading

# R parity
pixi run parity               # All fast R parity tests
pixi run parity slow          # All slow parity tests
pixi run parity lm            # LM vs R
pixi run parity glm           # GLM vs R
pixi run parity lmer          # LMER vs R
pixi run parity glmer         # GLMER vs R
pixi run parity emmeans       # EMMs vs R
pixi run parity compare       # Compare vs R

# Property-based
pixi run hypothesis        # All Hypothesis tests

# Environment
pixi run test pyodide      # WASM/browser compatibility

What each layer covers¶

Layer	Target	Value
Unit tests	`internal/` domain modules, `internal/maths/`	Logic correctness — pure functions with clear I/O
Parity tests	Full pipeline vs R/lme4	Numerical correctness — coefficients, SEs, CIs match R
Hypothesis tests	Mathematical invariants	Property correctness — theorems hold for all valid input
Redteam tests	Edge cases, pathological input	Robustness — graceful failures, informative errors
Pyodide tests	Browser environment	Portability — NumPy backend works without JAX

Writing parity tests¶

Parity tests compare bossanova output against R reference values. The pattern:

Generate reference values in R (stored as fixtures or computed in-test via Rscript)
Fit the same model in bossanova
Assert numerical agreement within tolerance (~1e-6 for coefficients, ~1e-4 for p-values)

When bossanova and R disagree, investigate: is it a bug, a difference in defaults, or a methodological choice? Document the finding either way.

Code Quality Checklist¶

Before marking any implementation work as done:

pixi run lint passes (ruff + ty check)
pixi run test passes (all unit tests)
Every new function has full type annotations (params + return)
Every new function has a Google-style docstring (Args, Returns at minimum)
Every new container field has a validator
No new file exceeds 1200 lines (800 preferred; 800–1200 only for cohesive units) (wc -l path/to/file.py)
No implementation logic added to the model class
__all__ updated in affected modules

Conventions¶

Type hints¶

Use modern Python 3.10+ syntax:

# Correct
def fit(self, weights: str | None = None) -> Self: ...

# Avoid
def fit(self, weights: Optional[str] = None) -> "Model": ...

Every function in internal/ must have full type annotations — all parameters and return type.

Docstrings (Google style)¶

def compute_emm(
    spec: ModelSpec,
    bundle: DataBundle,
    fit: FitState,
    parsed: ExploreFormula,
) -> MeeState:
    """Compute estimated marginal means.

    Args:
        spec: Model configuration (family, link, formula).
        bundle: Validated model data (X, y, factor levels).
        fit: Fitted model state (coefficients, vcov).
        parsed: Parsed explore formula (focal variable, conditions).

    Returns:
        MeeState with grid, estimates, and metadata.
    """

Error handling¶

Validate at system boundaries (container constructors, model methods)
Trust internal callers in maths/ functions
Use informative messages: “Variable ‘x’ not found in data. Available: ...”
Broad except Exception only in resampling loops (return NaN sentinel)

Naming¶

Module files: lowercase_with_underscores.py
Test files: test_ prefix
Hypothesis tests: *_hypothesis.py suffix
No _ prefix for standalone functions in internal/ — the module boundary communicates privacy
_ prefix for model class helper methods (model._check_fitted())

File size¶

Target 400–800 lines per file. Up to 1200 is acceptable for cohesive units (e.g., all EMM computation in one file). Over 1200 always requires a split. When a file approaches the limit:

Identify natural split points (distinct concerns, helper clusters)
Split into focused modules
Use __init__.py re-exports to maintain import paths

Module hygiene¶

Every module declares __all__
Imports are ordered: stdlib, third-party, internal, conditional (TYPE_CHECKING)
In files over 300 lines, use section markers for navigation:

# =============================================================================
# Core Functions
# =============================================================================

JAX imports¶

Never import JAX unconditionally — it breaks Pyodide:

# Correct: conditional import with fallback
try:
    import jax
    import jax.numpy as jnp
except ImportError:
    import numpy as jnp

Use get_ops() from internal/maths/backend/ for array operations. Use RNG from internal/maths/rng instead of direct JAX/NumPy random calls.