Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Doing the Bossa Nova

UCSD Psychology

This tutorial gives you a quick tour of the workflow and features that bossanova provides for estimating a variety of models through a single unified model class that works the same no matter what type of model you’re working with:

Model TypeSyntaxR equivalent
linearmodel("y ~ x")lm(y ~ x)
generalized linearmodel("y ~ x", family="binomial")glm(y ~ x, family=binomial)
linear mixed/multi-levelmodel("y ~ x + (1 | x)")lme4::lmer(y ~ x + (1 | x))
generalized mixed/multi-levelmodel("y ~ x + (1 | x)", family="binomial")lme4::glmer(y ~ x + (1 | x), family=binomial)

bossanova is also tightly integrated with polars DataFrames and seaborn FacetGrids. It also includes several built-in datasets for teaching and demonstration purposes. Let’s load one now to walkthrough basic usage.

import polars as pl
from polars import col
import seaborn as sns
from bossanova import load_dataset
credit = load_dataset("credit")
credit.head()
Loading...

Overview

Methods

NameDescription
fit()Fit the model to data.
simulate()Generate data from scratch or simulate responses from a fitted model.
predict()Generate predictions from the fitted model.
explore()Compute marginal effects or estimated marginal means (EMMs).
infer()Augment current results with statistical inference.
summary()Generate a formatted model summary (R-style or compact).

Attributes

AttributeTypeAvailable afterDescription
designmatpl.DataFramemodel creationFixed-effects design matrix as a named DataFrame.
metadatapl.DataFramemodel creationMeta-data about missing observations
paramspl.DataFrame.fit()Coefficient estimates table from the fitted model.
diagnosticspl.DataFrame.fit()Model-level goodness-of-fit diagnostics.
effectspl.DataFrame.explore()Marginal effects or estimated marginal means (EMMs) table.
predictionspl.DataFrame.predict()Predictions table from the fitted model.
simulationspl.DataFrame.simulate()Simulations table from data generation or response simulation.
resamplespl.DataFrame | None.infer()Raw resampled values from bootstrap or permutation inference.

1. Universal model class

A model that needs at least two arguments:

  1. formula: a string specifying the model using Wilkinson notation (same as R’s lm/glm/lmer/glmer)

  2. data: a polars DataFrame (optional if you just want to simulate)

# Import model 
from bossanova import model

# Create one
m1 = model("Balance ~ Income * Student", credit)

# Check it out
m1
Loading...

The design matrix

Before fitting, we can inspect the design matrix — the actual numbers the model will work with. This is the XX matrix from the GLM equation y^=Xβ^\hat{y} = X\hat{\beta}:

m1.designmat.head()
Loading...

Visualizing it can be extremely helpful to ensure estimates are what you intended when fitting:

# seaborn facetgrid kwargs are supported
m1.plot_design()
m1.plot_design(aspect=2)
<seaborn.axisgrid.FacetGrid at 0x1080081a0>
<seaborn.axisgrid.FacetGrid at 0x15efc1450>
<Figure size 600x650 with 1 Axes>
<Figure size 800x400 with 1 Axes>

The design matrix has one row per observation and one column per model term:

Multi-collinearity & variance inflation

Before fitting a model we can inspect how much multi-collinearity between predictors might inflate our uncertainty:

m1.plot_vif()
<seaborn.axisgrid.FacetGrid at 0x15eff3b10>
<Figure size 400x400 with 2 Axes>

Factors & transformations

Since we’re including an interaction it’s helpful to center Income. bossanova supports all kinds of transformations directly in the formula

# Center Income and use sum/deviation coding for Student
m1 = model("Balance ~ center(Income) * Student", credit)

# See new VIF
m1.plot_vif()
<seaborn.axisgrid.FacetGrid at 0x15f3e3110>
<Figure size 400x400 with 2 Axes>

Given the tight integration with polars which requires each column of a dataframe to have the same type, bossanova will automatically treat all non-numeric columns as factors using a default treatment/dummy coding scheme with the reference level as the first alphabetical level:

m2 = model("Balance ~ Ethnicity", credit)
m2.plot_design()
<seaborn.axisgrid.FacetGrid at 0x15f3e2d50>
<Figure size 600x650 with 1 Axes>

We can adjust this by specifying the coding scheme directly in the formula:

m2 = model("Balance ~ poly(Ethnicity)", credit)
m2.plot_design()
<seaborn.axisgrid.FacetGrid at 0x15f861bd0>
<Figure size 600x650 with 1 Axes>

Formula reference

Also available in the model API reference.

Operators

OperatorExampleMeaning
+a + bMain effects (additive)
*a * bMain effects + interaction (a + b + a:b)
:a:bInteraction term only
** / ^x**2Power term (quadratic, cubic, etc.)
|(1 | group)Random effect (mixed models)

In-formula transforms

TransformEffectExample
center(x)x - mean(x)y ~ center(age)
norm(x)x / sd(x)y ~ norm(income)
zscore(x)(x - mean(x)) / sd(x)y ~ zscore(x)
scale(x)(x - mean(x)) / (2 * sd(x)) — Gelman scalingy ~ scale(x)
rank(x)Average-method ranky ~ rank(x)
signed_rank(x)sign(x) * rank(|x|)y ~ signed_rank(x)
log(x)Natural logy ~ log(x)
log10(x)Base-10 logy ~ log10(x)
sqrt(x)Square rooty ~ sqrt(x)
poly(x, d)Orthogonal polynomial basis of degree dy ~ poly(x, 2)
factor(x)Treat as categoricaly ~ factor(year)

All stateful transforms (center, norm, zscore, scale, rank, signed_rank) learn parameters from training data and reapply them on new data during .predict(). Nesting is supported: zscore(rank(x)).

Contrast coding (in-formula)

FunctionExampleIntercept meansCoefficients mean
treatment(x)y ~ treatment(group)Reference level meanDifference from reference
sum(x)y ~ sum(group)Grand meanDeviation from grand mean
helmert(x)y ~ helmert(group)Grand meanLevel vs. mean of previous levels
poly(x)y ~ poly(group)Grand meanLinear, quadratic, cubic trends
sequential(x)y ~ sequential(group)First level meanSuccessive differences (B-A, C-B)

Set a custom reference level or ordering with keyword args: treatment(group, ref=B), sum(group, omit=A), helmert(group, [low, med, high]).

model("y ~ sum(group) + x", data)                    # sum coding
model("y ~ treatment(group, ref=B) + x", data)       # treatment with ref="B"
model("y ~ group + x", data, contrasts={"group": M}) # custom ndarray matrix

2. .fit(): model estimation

To estimate the model parameters (β^\hat{\beta}), we can call .fit()

Parameter estimates

This returns parameter estimates as a dataframe in also accessible in .params

m1.fit()
Loading...

Model diagnostics

The .diagnostics property gives us key fit statistics:

m1.diagnostics
Loading...

We can also check the .metadata which summarizes useful info about the data the model use for fitting. This is helpful because we have to drop rows where data is missing and this lets you inspect if/when that’s happening.

m1.metadata
Loading...

models also include a variety of plotting methods to inspect the model fit and errors:

m1.plot_resid()
<seaborn.axisgrid.FacetGrid at 0x15f9cde50>
<Figure size 700x700 with 4 Axes>

Augmented fits & predictions

Similar to libraries like broom in R, after calling .fit(), the .data property of a model stores the original data plus several computed columns:

ColumnDescription
fittedPredicted values (y^i\hat{y}_i)
residResiduals (yiy^iy_i - \hat{y}_i)
hatLeverage values
std_residStandardized residuals
cooksdCook’s distance (influence)
m1.data.head()
Loading...

Let’s plot predicted vs observed Balance to see how well our model does:

sns.relplot(
    data=m1.data.to_pandas(),
    x="fitted",
    y="Balance",
    kind="scatter",
    height=4,
    aspect=1.2,
).set_axis_labels("Predicted Balance", "Observed Balance")
<seaborn.axisgrid.FacetGrid at 0x15dba11d0>
<Figure size 480x400 with 1 Axes>

3. .infer(): uncertainty & inference

Asymptotic (default)

m1.infer()
Loading...
sleep = load_dataset("sleep")

lmm = model("Reaction ~ Days + (Days | Subject)", sleep).fit()

lmm.infer()
Loading...

Bootstrap resampling

m1.infer(how="boot")
Loading...

Bootstrap inference resamples the data with replacement, refits the model each time, and builds confidence intervals from the empirical distribution of estimates. By default it uses bias-corrected and accelerated (BCa) intervals with 1000 resamples. Control the number with n_boot=:

m1.infer(how="boot", n_boot=2000)
Loading...

The raw bootstrap distributions are stored in .resamples and can be visualized:

m1.plot_resamples()
<seaborn.axisgrid.FacetGrid at 0x168be0410>
<Figure size 1080x300 with 3 Axes>

Permutation resampling

m1.infer(how="perm")
Loading...

4. .jointtest(): omnibus (ANOVA) tests

While .infer() tests each coefficient individually, .jointtest() performs ANOVA-style omnibus F-tests that ask whether an entire predictor (which may span multiple coefficients, like a multi-level factor or an interaction) explains significant variance:

m1.jointtest()
Loading...

5. .explore(): marginal effects

.explore() computes estimated marginal means (EMMs) for categorical predictors and marginal slopes for continuous ones. It uses an R-style formula mini-language to specify what you want to see. Let’s switch to the Titanic dataset and fit a logistic regression to see how survival depends on sex, age, and their interaction:

titanic = load_dataset("titanic")
m2 = model("survived ~ sex * age", titanic, family="binomial").fit()
m2.infer()
Loading...

For a categorical predictor, .explore() returns the model-predicted mean at each level; for a continuous predictor, it returns the average slope (rate of change):

# EMMs for sex — predicted survival probability for each level
m2.explore("sex")
Loading...
# Marginal slope of age — average rate of change in survival per year
m2.explore("age")
Loading...

Focal exploration with @

The @ operator pins a variable at specific values. Use it to evaluate the model at chosen points rather than the default grid:

# Survival slope at specific ages
m2.explore("age@[20, 40, 60]")
Loading...
# EMM for a single sex level
m2.explore("sex@female")
Loading...

Conditional exploration with ~

The ~ operator breaks results out by another variable, letting you see how effects differ across groups or values of a moderator:

# How does the age slope differ between men and women?
m2.explore("age ~ sex")
Loading...

Contrasts

Compute contrasts directly using bracket notation or named contrast functions:

# Difference in survival between males and females
m2.explore("sex[male - female]")
Loading...
# Pairwise with inference
m2.explore("pairwise(sex)").infer()
Loading...

Visualizing with .plot_explore()

.plot_explore() accepts the same formula syntax and produces a seaborn FacetGrid:

m2.plot_explore("age ~ sex")
<seaborn.axisgrid.FacetGrid at 0x374176210>
<Figure size 480x400 with 1 Axes>
m2.plot_explore("sex")
<seaborn.axisgrid.FacetGrid at 0x3743ec190>
<Figure size 480x400 with 1 Axes>

Cheatsheet

Also available in the explore API reference.

FormulaComputes
"X"EMMs (categorical) or marginal slope (continuous)
"X@v"EMM at a single value
"X@[a, b, c]"EMMs at specific levels or values
"X@range(n)"EMMs at n evenly-spaced values
"X@quantile(n)"EMMs at n quantile values
"X ~ Z"Break X out by levels of Z
"X ~ Z@v"Pin Z at value v
"X ~ Z@[v1, v2]"Cross X with multiple Z values
"X ~ Z@range(n)"Cross X with n evenly-spaced Z values
"X ~ Z@quantile(n)"Cross X with n quantile Z values
"X ~ A + B"Multiple conditions
X[A - B]A minus B
X[A - B, C - B]Multiple comparisons
X[* - B]Each other level vs B
X[(A + B) - C]mean(A, B) minus C
X[A - B] ~ Z[C - D]Interaction: (A-B at C) minus (A-B at D)
x ~ Z[A - B]Slope of x at A minus slope at B
X:Z[A:C - B:D]Cell A:C minus cell B:D

Functions for specific contrasts:

FormulaComputesOrder-dependent
treatment(X, ref=A)Each level vs reference ANo
dummy(X, ref=A)same as treatmentNo
sum(X)Each level vs grand meanNo
deviation(X, ref=A)same as sumNo
poly(X) / poly(X, d)Orthogonal polynomials up to degree dYes
sequential(X)Adjacent: B-A, C-BYes
helmert(X)Each level vs mean of previousYes
pairwise(X)All pairs: B-A, C-A, C-BNo

Effects scaling control:

KwargValuesEffect
effect_scale"link" (default), "response"Linear predictor vs response scale (GLMs)
varying"exclude" (default), "include"Population vs group-specific (mixed models)
inverse_transformsTrue (default), FalseAuto-resolve center(x) / zscore(x)

6. .predict(): generalization

.predict() generates predictions from the fitted model. It accepts a formula string, a DataFrame, or None:

# Predictions on training data
m2.predict()
Loading...

Formula-mode predictions

Pass a formula string to build a prediction grid automatically — like R’s ggpredict(). Grid columns are prepended to the output:

# Predicted survival across ages, broken out by sex
m2.predict("age ~ sex")
Loading...

The same @ modifiers from .explore() work here too:

# 10-point age grid crossed with sex
m2.predict("age@range(10) ~ sex")
Loading...

Chain .infer() to add confidence intervals to the grid:

m2.predict("age ~ sex").infer()
Loading...

DataFrame predictions

Pass a DataFrame to predict on specific new observations:

new_passengers = pl.DataFrame({"sex": ["male", "female", "male"], "age": [25.0, 40.0, 60.0]})
m2.predict(new_passengers)
Loading...

Visualizing with .plot_predict()

.plot_predict() accepts the same formula syntax and produces a seaborn FacetGrid showing predicted trends with confidence bands:

m2.plot_predict("age ~ sex")
<seaborn.axisgrid.FacetGrid at 0x37434fb10>
<Figure size 578x400 with 1 Axes>
m2.plot_predict("age ~ sex", show_data=True, show_rug=True)
<seaborn.axisgrid.FacetGrid at 0x3745ae210>
<Figure size 578x400 with 1 Axes>

Cross-validated model ablation

Use .infer(how="cv") to evaluate out-of-sample prediction accuracy via k-fold cross-validation. This is useful for comparing models — a model with lower CV error generalizes better:

m2.infer(how="cv", k=5)
Loading...
m2.diagnostics
Loading...

7. .simulate(): planning

Power Analysis

from bossanova.distributions import normal
m = model("y ~ x")
m.simulate(
    n=100,
    x=normal(0, 1),
    coef={"x": 0.5},
    sigma=1.0,
    power=500,
)
m.power_results
Loading...
Loading...

Each row corresponds to one model parameter (here, Intercept and x):

ColumnMeaning
nSample size used
termParameter name
true_valueThe true coefficient you specified (0 for unspecified terms)
powerProportion of simulations where the p-value < 0.05
power_ci_lower/upperWilson score 95% CI on the power estimate
coverageProportion of 95% CIs that contain the true value (~0.95 if well-calibrated)
biasMean estimate minus true value (should be near 0)
rmseRoot mean squared error of estimates
mean_seAverage estimated standard error
empirical_seStandard deviation of estimates across simulations
n_simsNumber of successful simulations
n_failedNumber of simulations that failed to converge

For the Intercept (true value = 0), “power” is really the Type I error rate — it should be around 0.05. For x (true value = 0.5), power tells you how often you’d correctly reject H0:βx=0H_0: \beta_x = 0. power=500 means “run 500 simulations.” Each simulation generates a fresh dataset from your DGP, fits the model, and runs inference. The results are then aggregated into the power table.

Sweeping Sample Size

The most common power analysis question is: how large does my sample need to be? Pass a list of sample sizes via power={"n": [...]}:

m = model("y ~ x")
m.simulate(
    n=100,
    x=normal(0, 1),
    coef={"x": 0.3},
    sigma=1.0,
    power={"n_sims": 500, "n": [30, 50, 100, 200, 500]},
)
m.power_results.filter(m.power_results["term"] == "x").select(
    ["n", "power", "power_ci_lower", "power_ci_upper"]
)
Loading...
Loading...

Now you can see exactly where power crosses 80% — the conventional threshold for adequate power.

Sweeping Effect Size

Sometimes you’re less sure about the effect size than the sample size. You can sweep over coefficients to see how sensitive your design is:

m = model("y ~ x")
m.simulate(
    n=100,
    x=normal(0, 1),
    coef={"x": 0.3},
    sigma=1.0,
    power={"n_sims": 500, "coef": {"x": [0.1, 0.2, 0.3, 0.5, 0.8]}},
)
m.power_results.filter(m.power_results["term"] == "x").select(
    ["true_value", "power", "coverage", "bias"]
)
Loading...
Loading...

With n=100n = 100, a small effect (βx=0.1\beta_x = 0.1) is nearly undetectable, while a large effect (βx=0.8\beta_x = 0.8) is found almost every time. Coverage should stay near 0.95 regardless of effect size — that’s a sign the inference is well-calibrated.

Full Factorial Sweeps

You can sweep multiple axes simultaneously. bossanova takes the full Cartesian product:

m = model("y ~ x")
m.simulate(
    n=50,
    x=normal(0, 1),
    coef={"x": 0.3},
    sigma=1.0,
    power={
        "n_sims": 300,
        "n": [50, 100, 200],
        "coef": {"x": [0.2, 0.5]},
    },
)

# Filter to x and show the power surface
m.power_results.filter(
    m.power_results["term"] == "x"
).select(["n", "true_value", "power"])
Loading...
Loading...

This gives you a 3 × 2 = 6-row power surface: for each combination of sample size and effect size, you see the estimated power. This is exactly the kind of table you’d include in a grant application or pre-registration.

Tips

Choosing n_sims: More simulations give tighter power estimates (the Wilson CIs shrink). 500 is a good default; 1000+ for final reports. For quick exploration, 100–200 is fine.

Interpreting coverage: If coverage is substantially below 0.95, your model’s confidence intervals are too narrow (anti-conservative). This can happen with very small samples or misspecified models.

Bias and RMSE: Bias should be near zero for well-specified models. RMSE combines bias and variance — it shrinks as nn increases.

Multiple predictors: Power analysis works with any formula. Each parameter gets its own row in the results:

8. compare(): model comparison

bossanova provides compare() as a general purpose model comparison function. You can use this to compare nested and non-nested models with a variety of approaches:

from bossanova import compare
m0 = model("Balance ~ 1", credit).fit()
m1 = model("Balance ~ Income", credit).fit()

compare(m0, m1)
Loading...

Compare varying (random) effect structures for mixed-models:

sleep = load_dataset("sleep")

intercept_only = model("Reaction ~ Days + (1 | Subject)", sleep).fit()
intercept_slope = model("Reaction ~ Days + (Days | Subject)", sleep).fit()
compare(intercept_only, intercept_slope)
Loading...