Every linear model makes assumptions about the errors . The default assumption is iid (independent and identically distributed): all errors have the same variance and are uncorrelated. When this assumption doesn’t hold, you don’t need a different test—you just need a different error assumption.
bossanova lets you switch error assumptions with a single argument to .infer():
| Assumption | errors= | When to use |
|---|---|---|
| Equal variance (default) | "iid" | Standard setting, homoscedastic data |
| Unequal group variances | "unequal_var" | Groups have different spreads |
| Unknown heteroscedasticity | "hetero" | Variance depends on predictors |
The model stays the same—only the inference changes. This is the model-based advantage: separating the structural model from the error model.
Equal Variance (Default)¶
The standard assumption: all errors share the same variance . Under this assumption, the coefficient variance is:
This gives the classical t-test when comparing two groups.
from scipy.stats import ttest_ind
# scipy (equal variance)
scipy_eq = ttest_ind(gentoo, adelie, equal_var=True)
scipy_eqTtestResult(statistic=np.float64(23.466803147391744), pvalue=np.float64(1.8806652580952688e-66), df=np.float64(263.0))df = penguins.filter(pl.col("species").is_in(["Adelie", "Gentoo"]))
m = model("body_mass_g ~ species", df).fit().infer(errors="iid")
m.params[1].select("term", "statistic", "df", "p_value")Unequal Group Variances¶
When groups have different spreads, the standard t-test is anticonservative. The "unequal_var" option replaces the pooled with per-group variance estimates and applies a Welch-Satterthwaite degrees of freedom adjustment — the same correction behind Welch’s t-test:
where are the contrast coefficients for group on parameter .
scipy¶
scipy_welch = ttest_ind(gentoo, adelie, equal_var=False)
scipy_welchTtestResult(statistic=np.float64(23.25392442915641), pvalue=np.float64(1.223170419256714e-63), df=np.float64(242.14429956468885))bossanova¶
m_welch = model("body_mass_g ~ species", df).fit().infer(errors="unequal_var")
m_welch.params[1].select("term", "statistic", "df", "p_value")The fractional df (instead of the integer ) confirms the Welch-Satterthwaite adjustment is active.
This works with covariates too—no classical test equivalent:
m_cov = model("body_mass_g ~ species + flipper_length_mm", penguins).fit().infer(
errors="unequal_var"
)
m_cov.params.select("term", "estimate", "df", "p_value")Heteroscedasticity-Consistent Errors¶
When you suspect non-constant variance but don’t know its form, use errors="hetero". This applies a sandwich (HC3) variance estimator that gives valid standard errors regardless of the variance structure:
where with being leverage.
df_reg = penguins.select("body_mass_g", "flipper_length_mm")
# Standard (assumes equal variance)
m_iid = model("body_mass_g ~ flipper_length_mm", df_reg).fit().infer(errors="iid")
# Heteroscedasticity-consistent
m_hetero = model("body_mass_g ~ flipper_length_mm", df_reg).fit().infer(errors="hetero")
pl.DataFrame({
"errors": ["iid", "hetero"],
"slope_se": [
float(m_iid.params["se"][1]),
float(m_hetero.params["se"][1])
],
"slope_p": [
float(m_iid.params["p_value"][1]),
float(m_hetero.params["p_value"][1])
]
})When the equal-variance assumption holds, both methods give similar SEs. When it doesn’t, the "hetero" SEs are more trustworthy.