Quick reference for the full bossanova API surface. See the model API docs for complete details.
The R tabs use: lme4 + lmerTest (mixed models), emmeans (marginal means & contrasts), sandwich + lmtest (robust SE s), car (Type III tests, VIF ), boot (bootstrap), broom (tidy output), simr (power analysis).
install.packages(c("lme4", "lmerTest", "emmeans", "sandwich",
"lmtest", "car", "boot", "broom", "simr"))Model types at a glance ¶ Model Formula familymethodLinear regression (LM) y ~ x1 + x2"gaussian" (default)"ols" (default)Logistic regression y ~ x1 + x2"binomial""ml"Poisson regression y ~ x1 + x2"poisson""ml"Gamma regression y ~ x1 + x2"gamma""ml"Linear mixed model (LMER) y ~ x + (1 | group)"gaussian" (default)"reml" (default)GLMM — logisticy ~ x + (1 | group)"binomial""ml"GLMM — Poissony ~ x + (1 | group)"poisson""ml"
Operators ¶ Operator Example Meaning +a + bMain effects (additive) *a * bMain effects + interaction (a + b + a:b) :a:bInteraction only ** / ^x**2Polynomial term (1 | g)y ~ x + (1 | subj)Random intercept (x | g)y ~ x + (x | subj)Random intercept + slope (correlated) (1 | g1/g2)y ~ x + (1 | school/class)Nested random effects
Transform Effect Example center(x)x - mean(x)y ~ center(age)zscore(x)(x - mean) / sdy ~ zscore(x)scale(x)Gelman scaling: (x - mean) / (2 * sd) y ~ scale(x)log(x)Natural log y ~ log(income)poly(x, d)Orthogonal polynomial of degree d y ~ poly(x, 3)factor(x)Force categorical y ~ factor(year)
Transforms are stateful — parameters learned at .fit() are reapplied at .predict(). Nesting supported: zscore(rank(x)).
Function Intercept = Coefficients = treatment(x)Reference level mean Difference from reference sum(x)Grand mean Deviation from grand mean helmert(x)Grand mean Level vs. mean of prior levels sequential(x)First level mean Successive differences
Set reference: treatment(group, ref=B), sum(group, omit=A), helmert(group, [low, med, high]).
Initialize ¶
from bossanova import model
m = model("y ~ x1 + x2", data)m = model("y ~ x1 + x2", data, family="binomial")
m = model("y ~ x1 + x2", data, family="poisson")m = model("y ~ x + (1 | subject)", data)
m = model("y ~ x + (x | subject)", data) # random slopem = model("y ~ x + (1 | subject)", data, family="binomial")
m = model("y ~ x + (1 | subject)", data, family="poisson")library(lme4)
library(lmerTest)
m <- lm(y ~ x1 + x2, data)m <- glm(y ~ x1 + x2, data, family = binomial)
m <- glm(y ~ x1 + x2, data, family = poisson)m <- lmer(y ~ x + (1 | subject), data)
m <- lmer(y ~ x + (x | subject), data) # random slopem <- glmer(y ~ x + (1 | subject), data, family = binomial)
m <- glmer(y ~ x + (1 | subject), data, family = poisson)Constructor parameters ¶ Parameter Type Default Notes formulastrrequired R-style formula dataDataFrame | str | NoneNonePolars DataFrame, file path, or None for simulation familystr"gaussian""gaussian", "binomial", "poisson", "gamma", "tdist"linkstr | NoneNoneNone = canonical link for familymethodstr | NoneNoneNone = auto ("ols" for LM, "reml" for LMER, "ml" for GLM /GLM ER)missingstr"drop""drop" or "fail"contrastsdict | NoneNone{col: ndarray} — prefer in-formula syntax instead
Fit ¶ Estimate model parameters given some data
→ API: .fit()
m = model("y ~ x", data).fit()m <- lm(y ~ x, data) # fitting is implicit in R
summary(m) # coefficients, R², F-test
coef(m) # coefficients onlyKey parameters ¶ Parameter Type Default Notes weightsstr | NoneNoneColumn name for observation weights offsetstr | NoneNoneColumn name for offset term nAGQint | NoneNoneGLM ER only: GHQ points (0=fast, 1=Laplace, >1=adaptive)
What .fit() populates ¶ Augmented data columns (accessible via m.data): fitted, resid, hat, std_resid, cooksd.
Explore ¶ Use a fitted model to estimate marginal effects
→ API: .explore()
m.fit().explore("pairwise(treatment)")
m.fit().explore("treatment") # estimated marginal means
m.fit().explore("x1") # marginal slopeslibrary(emmeans)
emmeans(m, ~ treatment) # estimated marginal means
pairs(emmeans(m, ~ treatment)) # pairwise contrasts
emtrends(m, ~ 1, var = "x1") # marginal slopes
The focal variable (left of ~) is the variable whose effect is estimated.
Conditions (right of ~) control the reference grid.
Formula Computes "x"EMMs for categorical x, marginal slope for continuous x"pairwise(x)"All pairwise contrasts "sequential(x)"Successive differences (B-A, C-B, ...) "x ~ z"Cross focal x with levels of z "x ~ z@[v1, v2]"Cross focal x with specific z values "x ~ z@range(5)"Cross focal x with 5 evenly-spaced z values "pairwise(x) ~ z"Pairwise contrasts at each level of z "x[A - B]"Contrast: A minus B "x[* - B]"Each other level vs B
emmeans Computes emmeans(m, ~ x)EMMs for categorical xemtrends(m, ~ 1, var = "x")Marginal slope for continuous x pairs(emmeans(m, ~ x))All pairwise contrasts contrast(emmeans(m, ~ x), method = "consec")Successive differences emmeans(m, ~ x * z)Cross x with levels of z emmeans(m, ~ x, at = list(z = c(1, 2)))Cross x with specific z values emtrends(m, ~ 1, var = "x", at = list(z = seq(...)))Cross x with evenly-spaced z values pairs(emmeans(m, ~ x | z))Pairwise contrasts at each level of z contrast(emmeans(m, ~ x), method = list(c(-1, 1, 0)))Custom contrast
Key parameters ¶ Parameter Type Default Notes formulastrrequired Focal variable ~ conditions (see table above) effect_scalestr"link""link" or "response" (GLM s)howstr"auto""auto", "mem" (at-mean), or "ame" (g-computation)varyingstr"exclude""exclude" or "include" random effects (mixed)bystr | NoneNoneSubgroup analysis column
What .explore() populates ¶ Property Description effectsGrid columns + estimate (+ SE , CI , p after .infer())
Predict ¶ Use a fitted model to make predictions
→ API: .predict()
m.fit().predict() # in-sample
m.fit().predict(newdata=new_df) # out-of-sample
m.fit().predict(type="link") # linear predictor (GLMs)predict(m) # in-sample
predict(m, newdata = new_df) # out-of-sample
predict(m, type = "link") # linear predictor (GLMs)
predict(m, newdata = new_df, re.form = NA) # exclude random effects (mixed)Key parameters ¶ Parameter Type Default Notes newdataDataFrame | NoneNoneNone = in-sample predictionstypestr"response""response" or "link"varyingstr"exclude""include" to use BLUPs in mixed modelsallow_new_levelsboolFalseAllow unseen group levels (mixed models)
What .predict() populates ¶ Property Description predictionsfitted (+ link for GLM s; + SE , CI after .infer())
Infer ¶ Compute uncertainty for parameter estimates, marginal effects, and model predictions
→ API: .infer()
m.fit().infer() # asymptotic (default)
m.fit().infer(how="boot", n_boot=5000) # bootstrap
m.fit().infer(errors="HC3") # robust SEs
m.fit().infer(how="cv", k=10) # cross-validation
m.fit().explore("pairwise(x)").infer() # infer on effects
m.fit().infer(how="profile") # profile CIs for variance components# Asymptotic (default with lmerTest for mixed models)
summary(m) # Satterthwaite p-values
confint(m) # Wald CIs
# Bootstrap
confint(m, method = "boot", nsim = 5000) # bootstrap CIs (mixed)
library(boot)
boot(data, function(d, i) coef(lm(y ~ x, d[i, ])), R = 5000) # general
# Robust SEs
library(sandwich); library(lmtest)
coeftest(m, vcov = vcovHC(m, type = "HC3"))
# Profile CIs (mixed models)
confint(m, method = "profile")
# Infer on effects
pairs(emmeans(m, ~ x), infer = c(TRUE, TRUE)) # CIs + p-valueshow × operation compatibility¶ asympbootpermcvprofilejoint.fit() → params✓ ✓ ✓ ✓ .explore() → effects✓ ✓ ✓ ✓ .predict() → predictions✓ ✓ ✓ .fit() → varying_spread✓
Key parameters ¶ Parameter Type Default Notes howstr"asymp"Inference method (see matrix above) conf_levelfloat | int | str0.95Accepts 0.95, 95, or "95%" errorsstr"iid""iid", "HC0"–"HC3", "hetero", "unequal_var"nullfloat0.0Null hypothesis value alternativestr"two-sided""two-sided", "greater", "less"n_bootint1000Bootstrap resamples n_permint1000Permutation resamples ci_typestr"bca""bca" or "percentile"kint10CV folds seedint | NoneNoneReproducibility n_jobsint1Parallel workers for resampling
Summary & export ¶ Print a nicely formatted R-style summary
→ API: .summary() · .show_math()
Method Returns Notes .summary()FormattedTextR-style summary; style="compact" for minimal .summary(digits=5)FormattedTextControl decimal places .show_math()MathDisplayLaTeX equation (renders in notebooks) .to_odds_ratio()DataFrameExponentiated estimates (binomial only) .to_response_scale()DataFrameInverse-link on effects .to_effect_size()DataFrameCohen’s d, η², r (semi-partial) .vif()DataFrameVariance inflation factors .filter_params(terms)DataFrameSubset params by term name(s) .filter_significant(alpha)DataFrameParams where p < alpha .filter_effects(...)DataFrameSubset effects by terms/levels/contrasts .jointtest()DataFrameType III F / χ² tests .to_markdown()strPipe-delimited markdown of last result .to_csv()strCSV of last result
Function Returns Notes summary(m)Console output Full model summary broom::tidy(m)tibble Tidy coefficient table broom::glance(m)tibble Model diagnostics (AIC , BIC , R², ...) exp(cbind(coef(m), confint(m)))matrix Odds ratios (binomial) car::vif(m)vector Variance inflation factors car::Anova(m, type = "III")data.frame Type III F / χ² tests broom::tidy(m) |> knitr::kable()string Markdown table write.csv(broom::tidy(m), ...)file CSV export
Simulate ¶ Simulate data from a model with or without fitting
→ API: .simulate()
from bossanova.distributions import normal, categorical, varying
m = model("y ~ x + group")
m.simulate(n=200, coef={"x": 0.5, "groupB": -1.0},
x=normal(0, 1), group=categorical(["A", "B"]))
m.fit().infer()m = model("y ~ x", data).fit()
m.simulate(nsim=1000) # 1000 response vectors from fitted modelm = model("y ~ x + group")
m.simulate(n=100, coef={"x": 0.5}, x=normal(0, 1),
group=categorical(["A", "B"]),
power=500) # 500 simulations
m.power_results # power, coverage, bias
m <- lm(y ~ x, data)
simulate(m, nsim = 1000) # 1000 response vectors from fitted modellibrary(simr)
m <- lmer(y ~ x + (1 | group), data)
powerSim(m, nsim = 500) # power via simulation
powerCurve(m, along = "group", nsim = 100) # power across sample sizesPlotting ¶ → API: plot methods
All plot methods return a matplotlib Figure or seaborn FacetGrid.
Model comparison ¶ Compare nested and non-nested models using a variety of approaches
→ API: compare()
from bossanova import compare
compare(m1, m2, m3) # Auto: F (LM), deviance (GLM), LRT (mixed)
compare(m1, m2, method="lrt") # Explicit likelihood ratio test
compare(m1, m2, method="aic") # AIC comparison with delta-AIC and weights
compare(m1, m2, method="bic") # BIC comparison with delta-BIC and weights
compare(m1, m2, method="cv", cv=10) # Cross-validation comparison
compare(m1, m2, method="lrt", refit=True) # Refit REML → ML for valid LRTanova(m1, m2, m3) # F-test (LM) or LRT (mixed)
anova(m1, m2, test = "LRT") # explicit likelihood ratio test
AIC(m1, m2, m3) # AIC comparison
BIC(m1, m2, m3) # BIC comparison
anova(m1, m2, refit = TRUE) # refit REML → ML for valid LRT (lmerTest)Properties quick lookup ¶ Property Returns Requires paramsCoefficients (+ SE , CI , p after .infer()) .fit()diagnosticsAIC , BIC , loglik, R², deviance, etc..fit()effectsEMMs / marginal slopes (+ SE , CI , p after .infer()).explore()predictionsFitted values (+ SE , CI after .infer()) .predict()simulationsGenerated data or simulated responses .simulate()resamplesRaw bootstrap/permutation values .infer(how="boot"|"perm")power_resultsPower, coverage, bias, RMSE .simulate(power=...)varying_offsetsBLUPs per group.fit() + mixedvarying_paramsPopulation + BLUP coefficients .fit() + mixedvarying_spreadVariance components (σ², τ², ICC ) .fit() + mixedvarying_corrRandom effect correlations .fit() + mixeddesignmatDesign matrix data metadatanobs, nparams, ngroups data
Utilities ¶ Function Purpose compare(*models)Model comparison: auto-selects F / deviance / LRT by type; or method="aic", "bic", "cv" load_dataset(name)Load a built-in dataset show_datasets()List all available datasets set_backend("jax")Switch to JAX backend (default: "numpy") get_backend()Current backend name set_display_digits(n)Set decimal places for display to_markdown(df)Export any DataFrame as markdown
Common patterns ¶
# Full workflow — one chain
m = model("y ~ x * group", data).fit().infer()
m.params # estimates with CIs and p-values
m.summary() # R-style summary table
# Explore + infer
m.explore("pairwise(group)").infer()
m.effects # pairwise contrasts with CIs
# Robust standard errors
m.fit().infer(errors="HC3")
# Bootstrap with parallel workers
m.fit().infer(how="boot", n_boot=5000, n_jobs=4)
# Mixed model with profile CIs on variance components
m = model("y ~ x + (1 | subj)", data).fit().infer(how="profile")
m.varying_spread # σ², τ² with profile CIs# Full workflow
library(lme4); library(lmerTest); library(emmeans)
m <- lm(y ~ x * group, data)
summary(m) # estimates with p-values
confint(m) # CIs
# Explore + infer
pairs(emmeans(m, ~ group), infer = c(TRUE, TRUE))
# Robust standard errors
library(sandwich); library(lmtest)
coeftest(m, vcov = vcovHC(m, type = "HC3"))
# Bootstrap CIs
library(boot)
b <- boot(data, function(d, i) coef(lm(y ~ x, d[i, ])), R = 5000)
boot.ci(b, type = "bca")
# Mixed model with profile CIs
m <- lmer(y ~ x + (1 | subj), data)
summary(m)
confint(m, method = "profile") # profile CIs on all parameters
VarCorr(m) # variance components