Classical Test bossanova Equivalent Variance Assumption Independent t-test model("y ~ group", df)Equal (pooled) Welch’s t-test model("y ~ group", df).infer(errors="unequal_var")Unequal (Welch df ) Mann-Whitney U model("rank(y) ~ group", df)Robust to outliers
All examples use the included penguins dataset.
import numpy as np
import polars as pl
from bossanova.model import model
from bossanova import load_dataset
np.random.seed(42)
# Load penguins dataset (filter out "NA" sex values which are strings, not nulls)
penguins = load_dataset("penguins").drop_nulls().filter(pl.col("sex") != "NA")
Independent t-test (Equal Variances) ¶ Classical:
t = x ˉ 1 − x ˉ 2 s p 1 n 1 + 1 n 2 , s p 2 = ( n 1 − 1 ) s 1 2 + ( n 2 − 1 ) s 2 2 n 1 + n 2 − 2 , t ∼ t ( n 1 + n 2 − 2 ) under H 0 t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}, \quad s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}, \quad t \sim t(n_1+n_2-2) \text{ under } H_0 t = s p n 1 1 + n 2 1 x ˉ 1 − x ˉ 2 , s p 2 = n 1 + n 2 − 2 ( n 1 − 1 ) s 1 2 + ( n 2 − 1 ) s 2 2 , t ∼ t ( n 1 + n 2 − 2 ) under H 0 As GLM :
y i ∼ N ( μ i , σ 2 ) , μ i = β 0 + β 1 g i , g i ∈ { 0 , 1 } y_i \sim \mathcal{N}(\mu_i, \sigma^2), \quad \mu_i = \beta_0 + \beta_1 g_i, \quad g_i \in \{0, 1\} y i ∼ N ( μ i , σ 2 ) , μ i = β 0 + β 1 g i , g i ∈ { 0 , 1 }
t = β ^ 1 SE ( β ^ 1 ) ∼ t ( n 1 + n 2 − 2 ) under H 0 : β 1 = 0 t = \frac{\hat{\beta}_1}{\text{SE}(\hat{\beta}_1)} \sim t(n_1+n_2-2) \text{ under } H_0: \beta_1 = 0 t = SE ( β ^ 1 ) β ^ 1 ∼ t ( n 1 + n 2 − 2 ) under H 0 : β 1 = 0 The slope β ^ 1 = x ˉ 1 − x ˉ 2 \hat{\beta}_1 = \bar{x}_1 - \bar{x}_2 β ^ 1 = x ˉ 1 − x ˉ 2 is the group mean difference; t classical = t β 1 t_{\text{classical}} = t_{\beta_1} t classical = t β 1 exactly.
scipy ¶ from scipy.stats import ttest_ind
male_bill = penguins.filter(pl.col("sex") == "male")["bill_depth_mm"].to_numpy()
female_bill = penguins.filter(pl.col("sex") == "female")["bill_depth_mm"].to_numpy()
scipy_ttest_eq = ttest_ind(male_bill, female_bill, equal_var=True)
scipy_ttest_eqTtestResult(statistic=np.float64(7.306540245129378), pvalue=np.float64(2.066410345755146e-12), df=np.float64(331.0))
bossanova ¶ m = model("bill_depth_mm ~ sex", penguins).fit().infer()
m.params[1].select("statistic", "df", "p_value")bn_t = float(m.params["statistic"][1])
bn_p = float(m.params["p_value"][1])
assert np.isclose(abs(bn_t), abs(scipy_ttest_eq.statistic), rtol=1e-3), f"t mismatch: {bn_t} vs {scipy_ttest_eq.statistic}"
assert np.isclose(bn_p, scipy_ttest_eq.pvalue, rtol=1e-3), f"p mismatch: {bn_p} vs {scipy_ttest_eq.pvalue}"
Welch’s t-test (Unequal Variances) ¶ Classical:
t = x ˉ 1 − x ˉ 2 s 1 2 n 1 + s 2 2 n 2 , t ∼ t ( d f W ) under H 0 t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}, \quad t \sim t(df_W) \text{ under } H_0 t = n 1 s 1 2 + n 2 s 2 2 x ˉ 1 − x ˉ 2 , t ∼ t ( d f W ) under H 0
d f W = ( s 1 2 n 1 + s 2 2 n 2 ) 2 ( s 1 2 / n 1 ) 2 n 1 − 1 + ( s 2 2 / n 2 ) 2 n 2 − 1 df_W = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}} d f W = n 1 − 1 ( s 1 2 / n 1 ) 2 + n 2 − 1 ( s 2 2 / n 2 ) 2 ( n 1 s 1 2 + n 2 s 2 2 ) 2 As GLM :
y i ∼ N ( μ i , σ 2 ) , μ i = β 0 + β 1 g i y_i \sim \mathcal{N}(\mu_i, \sigma^2), \quad \mu_i = \beta_0 + \beta_1 g_i y i ∼ N ( μ i , σ 2 ) , μ i = β 0 + β 1 g i
Var ( β ^ 1 ) = s 1 2 n 1 + s 2 2 n 2 , t = β ^ 1 Var ( β ^ 1 ) ∼ t ( d f W ) under H 0 : β 1 = 0 \text{Var}(\hat{\beta}_1) = \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}, \quad t = \frac{\hat{\beta}_1}{\sqrt{\text{Var}(\hat{\beta}_1)}} \sim t(df_W) \text{ under } H_0: \beta_1 = 0 Var ( β ^ 1 ) = n 1 s 1 2 + n 2 s 2 2 , t = Var ( β ^ 1 ) β ^ 1 ∼ t ( d f W ) under H 0 : β 1 = 0 Same structural model as the pooled t-test — only the variance estimator and degrees of freedom change.
scipy ¶ # Compare Adelie vs Gentoo bill length (species with different variances)
adelie = penguins.filter(pl.col("species") == "Adelie")["bill_length_mm"].to_numpy()
gentoo = penguins.filter(pl.col("species") == "Gentoo")["bill_length_mm"].to_numpy()
scipy_welch = ttest_ind(gentoo, adelie, equal_var=False)
scipy_welchTtestResult(statistic=np.float64(24.286066500471392), pvalue=np.float64(7.821528746388473e-66), df=np.float64(233.5085891130877))
bossanova ¶ df_species = penguins.filter(pl.col("species").is_in(["Adelie", "Gentoo"])).select("bill_length_mm", "species")
m_welch = model("bill_length_mm ~ species", df_species).fit().infer(errors="unequal_var")
m_welch.params[1].select("statistic", "df", "p_value")bn_t_w = float(m_welch.params["statistic"][1])
bn_p_w = float(m_welch.params["p_value"][1])
bn_df = float(m_welch.params["df"][1])
# Welch t, p, df should all match scipy exactly
assert np.isclose(abs(bn_t_w), abs(scipy_welch.statistic), rtol=1e-3), f"t mismatch: {bn_t_w} vs {scipy_welch.statistic}"
assert np.isclose(bn_p_w, scipy_welch.pvalue, rtol=1e-3), f"p mismatch: {bn_p_w} vs {scipy_welch.pvalue}"
assert np.isclose(bn_df, scipy_welch.df, rtol=1e-3), f"df mismatch: {bn_df} vs {scipy_welch.df}"
Mann-Whitney U Test ¶ Classical:
U = n 1 n 2 + n 1 ( n 1 + 1 ) 2 − R 1 , z = U − n 1 n 2 / 2 n 1 n 2 ( n 1 + n 2 + 1 ) / 12 ∼ ˙ N ( 0 , 1 ) under H 0 U = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1, \quad z = \frac{U - n_1 n_2 / 2}{\sqrt{n_1 n_2 (n_1+n_2+1)/12}} \dot{\sim} \mathcal{N}(0,1) \text{ under } H_0 U = n 1 n 2 + 2 n 1 ( n 1 + 1 ) − R 1 , z = n 1 n 2 ( n 1 + n 2 + 1 ) /12 U − n 1 n 2 /2 ∼ ˙ N ( 0 , 1 ) under H 0 where R 1 R_1 R 1 is the sum of ranks in group 1.
As GLM :
y i ∗ ∼ N ( μ i , σ 2 ) , μ i = β 0 + β 1 g i , where y i ∗ = rank ( y i ) y_i^* \sim \mathcal{N}(\mu_i, \sigma^2), \quad \mu_i = \beta_0 + \beta_1 g_i, \quad \text{where } y_i^* = \text{rank}(y_i) y i ∗ ∼ N ( μ i , σ 2 ) , μ i = β 0 + β 1 g i , where y i ∗ = rank ( y i )
t = β ^ 1 SE ( β ^ 1 ) ∼ t ( n 1 + n 2 − 2 ) under H 0 : β 1 = 0 t = \frac{\hat{\beta}_1}{\text{SE}(\hat{\beta}_1)} \sim t(n_1+n_2-2) \text{ under } H_0: \beta_1 = 0 t = SE ( β ^ 1 ) β ^ 1 ∼ t ( n 1 + n 2 − 2 ) under H 0 : β 1 = 0 The rank transformation makes inference robust to outliers and non-normality. The GLM replaces the U U U statistic with a t t t -test on ranks — both test the same null of equal group locations.
scipy ¶ from scipy.stats import mannwhitneyu
male_mass = penguins.filter(pl.col("sex") == "male")["body_mass_g"].to_numpy()
female_mass = penguins.filter(pl.col("sex") == "female")["body_mass_g"].to_numpy()
scipy_mw = mannwhitneyu(male_mass, female_mass, alternative='two-sided')
scipy_mwMannwhitneyuResult(statistic=np.float64(20845.5), pvalue=np.float64(1.8133343032461053e-15))
bossanova ¶ m_mw = model("rank(body_mass_g) ~ sex", penguins).fit().infer()
m_mw.params[1].select("statistic", "p_value")scipy reports the Mann-Whitney U statistic; bossanova reports a t-statistic on ranks. The test statistics differ but both test H₀: equal group locations and yield equivalent p-values.
bn_p_mw = float(m_mw.params["p_value"][1])
# Both methods should strongly reject (large sex difference in body mass)
assert bn_p_mw < 1e-10, f"Expected highly significant bossanova p-value, got {bn_p_mw}"
assert scipy_mw.pvalue < 1e-10, f"Expected highly significant scipy p-value, got {scipy_mw.pvalue}"
# Normal approximations should be in the same ballpark
assert np.isclose(bn_p_mw, scipy_mw.pvalue, rtol=0.5), f"p-value gap too large: {bn_p_mw} vs {scipy_mw.pvalue}"