ANOVA Tests - bossanova

Classical Test	bossanova Equivalent	Use Case
One-way ANOVA	`model("y ~ group", df).infer("joint").effects`	3+ groups, equal variance
Welch’s ANOVA	`model("y ~ group", df).infer("joint").infer(errors="unequal_var").effects`	3+ groups, unequal variance
Kruskal-Wallis	`model("rank(y) ~ group", df).infer("joint").effects`	3+ groups, robust
Two-way ANOVA	`model("y ~ a * b", df).infer("joint").effects`	Factorial design
ANCOVA	`model("y ~ group + covariate", df)`	Group comparison with adjustment

All examples use the included penguins dataset.

import numpy as np
import polars as pl
from bossanova.model import model
from bossanova import load_dataset

np.random.seed(42)

# Load penguins dataset
penguins = load_dataset("penguins").drop_nulls().filter(pl.col("sex") != "NA")

One-Way ANOVA¶

Classical:

F = \frac{MS_{\text{between}}}{MS_{\text{within}}} = \frac{SS_B/(k-1)}{SS_W/(N-k)}, \quad F \sim F(k-1,\; N-k) \text{ under } H_0

(1)

As GLM:

y_{ij} \sim \mathcal{N}(\mu_j, \sigma^2), \quad \mu_j = \beta_0 + \beta_1 x_{1j} + \cdots + \beta_{k-1} x_{(k-1)j}

(2)

Joint test $H_0: \beta_1 = \cdots = \beta_{k-1} = 0$ yields the same $F(k-1, N-k)$ statistic.

scipy¶

from scipy.stats import f_oneway

adelie = penguins.filter(pl.col("species") == "Adelie")["body_mass_g"].to_numpy()
chinstrap = penguins.filter(pl.col("species") == "Chinstrap")["body_mass_g"].to_numpy()
gentoo = penguins.filter(pl.col("species") == "Gentoo")["body_mass_g"].to_numpy()

scipy_anova = f_oneway(adelie, chinstrap, gentoo)
scipy_anova

F_onewayResult(statistic=np.float64(341.8948949481461), pvalue=np.float64(3.74450512630046e-81))

bossanova¶

m = model("body_mass_g ~ species", penguins).fit().infer("joint")

m.effects

anova_result = m.effects
bn_f = float(anova_result["f_ratio"][0])
bn_p = float(anova_result["p_value"][0])
assert np.isclose(bn_f, scipy_anova.statistic, rtol=1e-3), f"F mismatch: {bn_f} vs {scipy_anova.statistic}"
assert np.isclose(bn_p, scipy_anova.pvalue, rtol=1e-3), f"p mismatch: {bn_p} vs {scipy_anova.pvalue}"

Welch’s ANOVA (Unequal Variances)¶

Classical:

F_W = \frac{1}{k-1} \frac{\displaystyle\sum_{j=1}^{k} w_j (\bar{x}_j - \hat{\mu}_w)^2}{1 + \frac{2(k-2)}{k^2-1} \displaystyle\sum_{j=1}^{k} \frac{(1 - w_j/W)^2}{n_j - 1}}, \quad w_j = \frac{n_j}{s_j^2}, \quad W = \sum w_j

(3)

\hat{\mu}_w = \frac{\sum w_j \bar{x}_j}{W}, \quad F_W \dot{\sim} F(k-1,\; df_W)

(4)

As GLM:

y_{ij} \sim \mathcal{N}(\mu_j, \sigma_j^2), \quad \mu_j = \beta_0 + \beta_1 x_{1j} + \cdots + \beta_{k-1} x_{(k-1)j}

(5)

Same structural model as one-way ANOVA — only the variance estimator changes. Per-group variances $\sigma_j^2$ yield Welch-Satterthwaite adjusted degrees of freedom for each coefficient’s $t$ -test.

bossanova¶

m_welch = model("body_mass_g ~ species", penguins).fit().infer(errors="unequal_var")

m_welch.params.select("term", "estimate", "df", "p_value")

# Welch df should differ from standard n-k df, confirming Satterthwaite adjustment
n_k = len(penguins) - len(m_welch.params)
welch_dfs = [float(m_welch.params["df"][i]) for i in range(1, len(m_welch.params))]
assert all(df != n_k for df in welch_dfs), f"Expected Welch-adjusted df (not {n_k}), got {welch_dfs}"
# At least one species effect should be significant
welch_ps = [float(m_welch.params["p_value"][i]) for i in range(1, len(m_welch.params))]
assert any(p < 0.05 for p in welch_ps), f"Expected at least one significant species effect, got p-values {welch_ps}"

Kruskal-Wallis Test¶

Classical:

H = \frac{12}{N(N+1)} \sum_{j=1}^{k} \frac{R_j^2}{n_j} - 3(N+1), \quad H \dot{\sim} \chi^2(k-1) \text{ under } H_0

(6)

where $R_j$ is the sum of ranks in group $j$ .

As GLM:

y_{ij}^* \sim \mathcal{N}(\mu_j, \sigma^2), \quad \mu_j = \beta_0 + \beta_1 x_{1j} + \cdots, \quad \text{where } y_{ij}^* = \text{rank}(y_{ij})

(7)

Joint test $H_0: \beta_1 = \cdots = \beta_{k-1} = 0$ yields $F_{\text{ranks}} \approx H/(k-1)$ . The rank transformation makes ANOVA robust to outliers and non-normality.

scipy¶

from scipy.stats import kruskal

scipy_kw = kruskal(
    penguins.filter(pl.col("species") == "Adelie")["body_mass_g"].to_numpy(),
    penguins.filter(pl.col("species") == "Chinstrap")["body_mass_g"].to_numpy(),
    penguins.filter(pl.col("species") == "Gentoo")["body_mass_g"].to_numpy(),
)
scipy_kw

KruskalResult(statistic=np.float64(212.08513173193893), pvalue=np.float64(8.836876744281845e-47))

bossanova¶

m_kw = model("rank(body_mass_g) ~ species", penguins).fit().infer("joint")

m_kw.effects

scipy reports the Kruskal-Wallis H statistic (χ²-distributed); bossanova reports an F-ratio on ranks. The test statistics differ but both test H₀: all group locations equal and yield equivalent p-values.

bn_f_kw = float(m_kw.effects["f_ratio"][0])
bn_p_kw = float(m_kw.effects["p_value"][0])
# Both methods should strongly reject (large species difference in body mass)
assert bn_p_kw < 1e-10, f"Expected highly significant bossanova p-value, got {bn_p_kw}"
assert scipy_kw.pvalue < 1e-10, f"Expected highly significant scipy p-value, got {scipy_kw.pvalue}"
# Normal approximations should be in the same ballpark
assert np.isclose(bn_p_kw, scipy_kw.pvalue, rtol=0.5), f"p-value gap too large: {bn_p_kw} vs {scipy_kw.pvalue}"

Two-Way ANOVA¶

Classical:

F_A = \frac{MS_A}{MS_E} \sim F(a-1,\; N-ab), \quad F_B = \frac{MS_B}{MS_E} \sim F(b-1,\; N-ab), \quad F_{AB} = \frac{MS_{AB}}{MS_E} \sim F((a-1)(b-1),\; N-ab)

(8)

As GLM:

y_{ijk} \sim \mathcal{N}(\mu_{ij}, \sigma^2), \quad \mu_{ij} = \beta_0 + \alpha_i + \gamma_j + (\alpha\gamma)_{ij}

(9)

Each classical $F$ corresponds to a joint test on the corresponding GLM coefficients. The interaction $(\alpha\gamma)_{ij}$ tests whether the effect of one factor depends on the level of the other.

bossanova¶

m_2way = model("body_mass_g ~ species * sex", penguins).fit().infer("joint")

m_2way.effects

bn_f_2way = float(m_2way.effects["f_ratio"][0])
assert bn_f_2way > 0, f"Expected positive F-ratio, got {bn_f_2way}"
assert float(m_2way.effects["p_value"][0]) < 0.05, "Expected significant joint F-test"

ANCOVA (Covariate Adjustment)¶

Classical:

\bar{y}_j^{\text{adj}} = \bar{y}_j - \hat{\beta}_w(\bar{x}_j - \bar{x}_{\cdot\cdot}), \quad F = \frac{MS_{\text{groups (adj)}}}{MS_E} \sim F(k-1,\; N-k-1) \text{ under } H_0

(10)

where $\hat{\beta}_w$ is the pooled within-group regression slope.

As GLM:

y_i \sim \mathcal{N}(\mu_i, \sigma^2), \quad \mu_i = \beta_0 + \beta_1 g_i + \beta_2 x_i

(11)

The group effect $\beta_1$ is the adjusted mean difference after controlling for the covariate $x_i$ . ANCOVA reduces residual variance and removes confounding by the covariate.

bossanova¶

# Compare species controlling for flipper length
m_ancova = model("body_mass_g ~ species + flipper_length_mm", penguins).fit().infer()

m_ancova.params.select("term", "estimate", "p_value")

# All terms should be significant (species and flipper length both predict body mass)
for i in range(1, len(m_ancova.params)):
    assert float(m_ancova.params["p_value"][i]) < 0.05, f"Expected significant term at row {i}"
# Flipper length coefficient should be positive (longer flippers → heavier penguins)
flipper_row = next(i for i in range(len(m_ancova.params)) if "flipper" in str(m_ancova.params["term"][i]))
assert float(m_ancova.params["estimate"][flipper_row]) > 0, "Expected positive flipper length coefficient"
# ANCOVA R² should exceed species-only model (covariate reduces residual variance)
m_species_only = model("body_mass_g ~ species", penguins).fit()
r2_ancova = float(m_ancova.diagnostics["rsquared"][0])
r2_species = float(m_species_only.diagnostics["rsquared"][0])
assert r2_ancova > r2_species, f"ANCOVA R² ({r2_ancova}) should exceed species-only ({r2_species})"