Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Heteroscedastic Errors

UCSD Psychology

Every linear model makes assumptions about the errors ϵi\epsilon_i. The default assumption is iid (independent and identically distributed): all errors have the same variance and are uncorrelated. When this assumption doesn’t hold, you don’t need a different test—you just need a different error assumption.

bossanova lets you switch error assumptions with a single argument to .infer():

Assumptionerrors=When to use
Equal variance (default)"iid"Standard setting, homoscedastic data
Unequal group variances"unequal_var"Groups have different spreads
Unknown heteroscedasticity"hetero"Variance depends on predictors

The model stays the same—only the inference changes. This is the model-based advantage: separating the structural model from the error model.

Equal Variance (Default)

The standard assumption: all errors share the same variance σ2\sigma^2. Under this assumption, the coefficient variance is:

Var(β^)=σ^2(XX)1,σ^2=RSSnp\text{Var}(\hat{\beta}) = \hat{\sigma}^2 (\mathbf{X}'\mathbf{X})^{-1}, \quad \hat{\sigma}^2 = \frac{\text{RSS}}{n - p}

This gives the classical t-test when comparing two groups.

from scipy.stats import ttest_ind

# scipy (equal variance)
scipy_eq = ttest_ind(gentoo, adelie, equal_var=True)
scipy_eq
TtestResult(statistic=np.float64(23.466803147391744), pvalue=np.float64(1.8806652580952688e-66), df=np.float64(263.0))
df = penguins.filter(pl.col("species").is_in(["Adelie", "Gentoo"]))
m = model("body_mass_g ~ species", df).fit().infer(errors="iid")

m.params[1].select("term", "statistic", "df", "p_value")
Loading...

Unequal Group Variances

When groups have different spreads, the standard t-test is anticonservative. The "unequal_var" option replaces the pooled σ^2\hat{\sigma}^2 with per-group variance estimates and applies a Welch-Satterthwaite degrees of freedom adjustment — the same correction behind Welch’s t-test:

Var(β^j)=gsg2ngcgj2,dfj=Var(β^j)2g(sg2cgj2/ng)2ng1\text{Var}(\hat{\beta}_j) = \sum_g \frac{s_g^2}{n_g} c_{gj}^2, \quad df_j = \frac{\text{Var}(\hat{\beta}_j)^2}{\sum_g \frac{(s_g^2 c_{gj}^2 / n_g)^2}{n_g - 1}}

where cgjc_{gj} are the contrast coefficients for group gg on parameter jj.

scipy

scipy_welch = ttest_ind(gentoo, adelie, equal_var=False)
scipy_welch
TtestResult(statistic=np.float64(23.25392442915641), pvalue=np.float64(1.223170419256714e-63), df=np.float64(242.14429956468885))

bossanova

m_welch = model("body_mass_g ~ species", df).fit().infer(errors="unequal_var")

m_welch.params[1].select("term", "statistic", "df", "p_value")
Loading...

The fractional df (instead of the integer npn - p) confirms the Welch-Satterthwaite adjustment is active.

This works with covariates too—no classical test equivalent:

m_cov = model("body_mass_g ~ species + flipper_length_mm", penguins).fit().infer(
    errors="unequal_var"
)

m_cov.params.select("term", "estimate", "df", "p_value")
Loading...

Heteroscedasticity-Consistent Errors

When you suspect non-constant variance but don’t know its form, use errors="hetero". This applies a sandwich (HC3) variance estimator that gives valid standard errors regardless of the variance structure:

Var(β^)=(XX)1XΩX(XX)1\text{Var}(\hat{\beta}) = (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}' \boldsymbol{\Omega} \mathbf{X} (\mathbf{X}'\mathbf{X})^{-1}

where Ω=diag(e^i2/(1hii)2)\boldsymbol{\Omega} = \text{diag}(\hat{e}_i^2 / (1-h_{ii})^2) with hiih_{ii} being leverage.

df_reg = penguins.select("body_mass_g", "flipper_length_mm")

# Standard (assumes equal variance)
m_iid = model("body_mass_g ~ flipper_length_mm", df_reg).fit().infer(errors="iid")

# Heteroscedasticity-consistent
m_hetero = model("body_mass_g ~ flipper_length_mm", df_reg).fit().infer(errors="hetero")

pl.DataFrame({
    "errors": ["iid", "hetero"],
    "slope_se": [
        float(m_iid.params["se"][1]),
        float(m_hetero.params["se"][1])
    ],
    "slope_p": [
        float(m_iid.params["p_value"][1]),
        float(m_hetero.params["p_value"][1])
    ]
})
Loading...

When the equal-variance assumption holds, both methods give similar SEs. When it doesn’t, the "hetero" SEs are more trustworthy.