Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Developer Guide

Practical patterns for contributors

UCSD Psychology

This guide covers the practical patterns and conventions for working on bossanova.


Getting Started

# Clone and install
git clone https://github.com/sciminds/bossanova.git
cd bossanova
pixi install

# Verify everything works
pixi run lint          # ruff + ty check
pixi run test          # unit tests
pixi run parity        # R parity tests (requires R + lme4)

All development tasks are managed through pixi. See the full task list in pyproject.toml.


Project Layout

bossanova/
├── internal/                      # ALL implementation lives here
│   ├── containers/                #   Data definitions (the "Entities")
│   │   ├── structs/               #     Frozen data classes
│   │   │   ├── specs.py           #       ModelSpec — formula, family, link
│   │   │   ├── data.py            #       DataBundle — X, y, groups, metadata
│   │   │   ├── state.py           #       FitState, InferenceState, PredictionState
│   │   │   ├── explore.py         #       MeeState, ExploreFormula
│   │   │   ├── display.py         #       Display/rendering state
│   │   │   └── formula.py         #       Formula-level containers
│   │   ├── builders/              #     Smart constructors for containers
│   │   ├── validators.py          #     is_ndarray, is_optional_ndarray
│   │   └── schemas.py             #     Polars DataFrame output schemas
│   │
│   ├── design/                    #   Design matrix coding (treatment, sum, etc.)
│   ├── formula/                   #   Formula parsing, design matrices
│   ├── marginal/                  #   EMMs, slopes, contrasts
│   ├── fit/                       #   Model fitting, diagnostics, convergence
│   ├── infer/                     #   Inference (bootstrap, permutation, CV, resample/)
│   ├── compare/                   #   Model comparison (LRT, AIC, F-test)
│   ├── simulation/                #   Data simulation
│   ├── rendering/                 #   Summary display formatting
│   │
│   ├── maths/                     #   Pure math, backend-aware
│   │   ├── backend/               #     JAX/NumPy dispatch (ArrayOps, get_ops)
│   │   ├── solvers/               #     QR, IRLS, PLS, PIRLS
│   │   ├── linalg/                #     QR, SVD, Cholesky, sparse ops
│   │   ├── inference/             #     Statistical inference math
│   │   ├── family/                #     GLM families and link functions
│   │   ├── distributions/         #     Distribution utilities
│   │   └── rng.py                 #     Backend-agnostic RNG
│   │
│   └── viz/                       #   Plotting implementations
│
├── model/                         # User-facing model class (thin glue)
├── distributions/                 # User-facing distribution factories
├── data/                          # User-facing dataset loading
├── expressions.py                 # User-facing formula transforms
└── __init__.py                    # Re-exports: compare, viz, lrt

Thinking in ECS

bossanova’s development workflow is container-first: define the data shape before writing logic.

The workflow

  1. DEFINE the container → What fields? What types? What invariants?

  2. IMPLEMENT the operation → Accept containers in, return containers out

  3. WIRE into model class → Thin glue: check → call → store → return self

Never skip step 1. Never start at step 3.

Containers

All containers are frozen attrs classes with validators on every field:

from attrs import frozen, field, validators
from bossanova.internal.containers.validators import is_ndarray, is_optional_ndarray

@frozen
class MeeState:
    """Marginal effects / EMM results.

    Created by: compute_emm(), compute_slopes()
    Consumed by: model.effects property, compute_mee_inference()
    Augmented by: attrs.evolve() after inference adds SEs/CIs
    """

    grid: pl.DataFrame = field(repr=False)
    estimate: np.ndarray = field(validator=is_ndarray)
    type: str = field(validator=validators.in_(("means", "slopes", "contrasts")))

    # Optional — added via evolve() after inference
    se: np.ndarray | None = field(default=None, validator=is_optional_ndarray)

Rules:

Operations

Operations are pure functions: containers in, containers out.

def compute_emm(
    spec: ModelSpec,
    bundle: DataBundle,
    fit: FitState,
    parsed: ExploreFormula,
) -> MeeState:
    """Compute estimated marginal means."""
    grid = build_reference_grid(bundle, parsed)
    estimates = grid_predictions(spec, bundle, fit, grid)
    return MeeState(grid=grid, estimate=estimates, type="means", ...)

Never accept the model class as a parameter. Never produce side effects.

Model class as glue

The model orchestrates — it does not implement:

def explore(self, formula: str, **kwargs) -> Self:
    self._check_fitted()
    parsed = parse_explore_formula(formula)      # internal/formula/
    self._mee = compute_emm(                     # internal/marginal/
        self._spec, self._bundle, self._fit,
        parsed, **kwargs,
    )
    return self

Test for glue: strip all logic from the method. If what remains is check → call → store → return self, it’s glue. If it has loops, conditionals on data, or array operations, extract to internal/.


Key Patterns

Backend dispatch

Two patterns for handling JAX/NumPy duality:

Pattern A — Early dispatch for algorithms requiring JAX control flow:

def fit_glm_irls(...):
    backend = get_backend()
    if backend == "jax":
        return _fit_glm_irls_jax(...)   # lax.while_loop
    else:
        return _fit_glm_irls_numpy(...)  # Python while loop

Pattern B — Polymorphic ops for semantically identical algorithms:

def compute_leverage(X: np.ndarray) -> np.ndarray:
    ops = get_ops()
    Q, _ = ops.qr(X)
    return ops.np.sum(Q**2, axis=1)

Use Pattern A when control flow genuinely differs. Use Pattern B everywhere else.

JIT caching

Cache JIT-compiled functions per backend to avoid recompilation:

_cache: dict[str, Any] = {}

def _make_fn(ops):
    def _core(X, y):
        Q, R = ops.qr(X)
        return ops.np.linalg.solve(R, Q.T @ y)
    return ops.jit(_core)

def _get_fn():
    backend = get_backend()
    if backend not in _cache:
        _cache[backend] = _make_fn(get_ops())
    return _cache[backend]

Augmentation via evolve

Containers grow through attrs.evolve(), not mutation:

from attrs import evolve

# After fitting
fit = fit_model(spec, bundle)

# After inference — evolve to add SEs, CIs
mee = compute_emm(spec, bundle, fit, parsed)
mee = evolve(mee, se=ses, ci_lower=ci_lo, ci_upper=ci_hi)

Polars, not pandas

bossanova is Polars-native internally. Pandas is accepted at the API boundary and converted immediately:

def __init__(self, data: pl.DataFrame | PandasDataFrame):
    if not isinstance(data, pl.DataFrame):
        data = pl.from_pandas(data)
    self._data = data  # Always Polars internally

Where to Find Things

Before writing new code, check what already exists:

NeedLocationExamples
Formula parsinginternal/formula/parse_explore_formula(), build_design_matrix()
Marginal effectsinternal/marginal/compute_emm(), compute_slopes()
Model fittinginternal/fit/fit_model(), compute_diagnostics()
Inference & resamplinginternal/infer/dispatch_infer(), bootstrap_inference()
Model comparisoninternal/compare/compare_models(), compute_lrt()
Design matrix codinginternal/design/treatment_coding(), sum_coding()
Linear algebrainternal/maths/linalg/QR, SVD, Cholesky
Solversinternal/maths/solvers/QR solve (LM), IRLS (GLM), PLS/PIRLS (mixed)
Statistical inferenceinternal/maths/inference/SEs, CIs, p-values, Satterthwaite df
GLM families & linksinternal/maths/family/Gaussian, Binomial, Poisson, link functions
Backend dispatchinternal/maths/backend/get_ops(), get_backend(), ArrayOps
RNG abstractioninternal/maths/rng.pyRNG.from_seed(), .split(), .normal()
DataFrame schemasinternal/containers/schemas.pyParamsAsymp, DiagnosticsSchema
Container validatorsinternal/containers/validators.pyis_ndarray, is_optional_ndarray

Testing

Running tests

# Development workflow
pixi run lint              # Lint + format + type check (run first)
pixi run test              # Fast unit tests (default — skips @pytest.mark.slow)
pixi run test full         # All unit tests including slow
pixi run test slow         # Only slow tests

# Module-specific
pixi run test-module specs         # Container classes
pixi run test-module distributions # Distribution module
pixi run test-model                # Unified model class
pixi run test-module linalg        # Linear algebra
pixi run test-module viz           # Visualization
pixi run test-module stats         # Statistics
pixi run test-module data          # Data loading

# R parity
pixi run parity               # All fast R parity tests
pixi run parity slow          # All slow parity tests
pixi run parity lm            # LM vs R
pixi run parity glm           # GLM vs R
pixi run parity lmer          # LMER vs R
pixi run parity glmer         # GLMER vs R
pixi run parity emmeans       # EMMs vs R
pixi run parity compare       # Compare vs R

# Property-based
pixi run hypothesis        # All Hypothesis tests

# Environment
pixi run test pyodide      # WASM/browser compatibility

What each layer covers

LayerTargetValue
Unit testsinternal/ domain modules, internal/maths/Logic correctness — pure functions with clear I/O
Parity testsFull pipeline vs R/lme4Numerical correctness — coefficients, SEs, CIs match R
Hypothesis testsMathematical invariantsProperty correctness — theorems hold for all valid input
Redteam testsEdge cases, pathological inputRobustness — graceful failures, informative errors
Pyodide testsBrowser environmentPortability — NumPy backend works without JAX

Writing parity tests

Parity tests compare bossanova output against R reference values. The pattern:

  1. Generate reference values in R (stored as fixtures or computed in-test via Rscript)

  2. Fit the same model in bossanova

  3. Assert numerical agreement within tolerance (~1e-6 for coefficients, ~1e-4 for p-values)

When bossanova and R disagree, investigate: is it a bug, a difference in defaults, or a methodological choice? Document the finding either way.


Code Quality Checklist

Before marking any implementation work as done:


Conventions

Type hints

Use modern Python 3.10+ syntax:

# Correct
def fit(self, weights: str | None = None) -> Self: ...

# Avoid
def fit(self, weights: Optional[str] = None) -> "Model": ...

Every function in internal/ must have full type annotations — all parameters and return type.

Docstrings (Google style)

def compute_emm(
    spec: ModelSpec,
    bundle: DataBundle,
    fit: FitState,
    parsed: ExploreFormula,
) -> MeeState:
    """Compute estimated marginal means.

    Args:
        spec: Model configuration (family, link, formula).
        bundle: Validated model data (X, y, factor levels).
        fit: Fitted model state (coefficients, vcov).
        parsed: Parsed explore formula (focal variable, conditions).

    Returns:
        MeeState with grid, estimates, and metadata.
    """

Error handling

Naming

File size

Target 400–800 lines per file. Up to 1200 is acceptable for cohesive units (e.g., all EMM computation in one file). Over 1200 always requires a split. When a file approaches the limit:

  1. Identify natural split points (distinct concerns, helper clusters)

  2. Split into focused modules

  3. Use __init__.py re-exports to maintain import paths

Module hygiene

# =============================================================================
# Core Functions
# =============================================================================

JAX imports

Never import JAX unconditionally — it breaks Pyodide:

# Correct: conditional import with fallback
try:
    import jax
    import jax.numpy as jnp
except ImportError:
    import numpy as jnp

Use get_ops() from internal/maths/backend/ for array operations. Use RNG from internal/maths/rng instead of direct JAX/NumPy random calls.