Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Classical Testbossanova EquivalentVariance Assumption
Independent t-testmodel("y ~ group", df)Equal (pooled)
Welch’s t-testmodel("y ~ group", df).infer(errors="unequal_var")Unequal (Welch df)
Mann-Whitney Umodel("rank(y) ~ group", df)Robust to outliers

All examples use the included penguins dataset.

Independent t-test (Equal Variances)

Classical:

t=xˉ1xˉ2sp1n1+1n2,sp2=(n11)s12+(n21)s22n1+n22,tt(n1+n22) under H0t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}, \quad s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}, \quad t \sim t(n_1+n_2-2) \text{ under } H_0

As GLM:

yiN(μi,σ2),μi=β0+β1gi,gi{0,1}y_i \sim \mathcal{N}(\mu_i, \sigma^2), \quad \mu_i = \beta_0 + \beta_1 g_i, \quad g_i \in \{0, 1\}

t=β^1SE(β^1)t(n1+n22) under H0:β1=0t = \frac{\hat{\beta}_1}{\text{SE}(\hat{\beta}_1)} \sim t(n_1+n_2-2) \text{ under } H_0: \beta_1 = 0

The slope β^1=xˉ1xˉ2\hat{\beta}_1 = \bar{x}_1 - \bar{x}_2 is the group mean difference; tclassical=tβ1t_{\text{classical}} = t_{\beta_1} exactly.

scipy

from scipy.stats import ttest_ind

male_bill = penguins.filter(pl.col("sex") == "male")["bill_depth_mm"].to_numpy()
female_bill = penguins.filter(pl.col("sex") == "female")["bill_depth_mm"].to_numpy()

scipy_ttest_eq = ttest_ind(male_bill, female_bill, equal_var=True)
scipy_ttest_eq
TtestResult(statistic=np.float64(7.306540245129378), pvalue=np.float64(2.066410345755146e-12), df=np.float64(331.0))

bossanova

m = model("bill_depth_mm ~ sex", penguins).fit().infer()

m.params[1].select("statistic", "df", "p_value")
Loading...

Welch’s t-test (Unequal Variances)

Classical:

t=xˉ1xˉ2s12n1+s22n2,tt(dfW) under H0t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}, \quad t \sim t(df_W) \text{ under } H_0

dfW=(s12n1+s22n2)2(s12/n1)2n11+(s22/n2)2n21df_W = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}

As GLM:

yiN(μi,σ2),μi=β0+β1giy_i \sim \mathcal{N}(\mu_i, \sigma^2), \quad \mu_i = \beta_0 + \beta_1 g_i

Var(β^1)=s12n1+s22n2,t=β^1Var(β^1)t(dfW) under H0:β1=0\text{Var}(\hat{\beta}_1) = \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}, \quad t = \frac{\hat{\beta}_1}{\sqrt{\text{Var}(\hat{\beta}_1)}} \sim t(df_W) \text{ under } H_0: \beta_1 = 0

Same structural model as the pooled t-test — only the variance estimator and degrees of freedom change.

scipy

# Compare Adelie vs Gentoo bill length (species with different variances)
adelie = penguins.filter(pl.col("species") == "Adelie")["bill_length_mm"].to_numpy()
gentoo = penguins.filter(pl.col("species") == "Gentoo")["bill_length_mm"].to_numpy()

scipy_welch = ttest_ind(gentoo, adelie, equal_var=False)
scipy_welch
TtestResult(statistic=np.float64(24.286066500471392), pvalue=np.float64(7.821528746388473e-66), df=np.float64(233.5085891130877))

bossanova

df_species = penguins.filter(pl.col("species").is_in(["Adelie", "Gentoo"])).select("bill_length_mm", "species")
m_welch = model("bill_length_mm ~ species", df_species).fit().infer(errors="unequal_var")

m_welch.params[1].select("statistic", "df", "p_value")
Loading...

Mann-Whitney U Test

Classical:

U=n1n2+n1(n1+1)2R1,z=Un1n2/2n1n2(n1+n2+1)/12˙N(0,1) under H0U = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1, \quad z = \frac{U - n_1 n_2 / 2}{\sqrt{n_1 n_2 (n_1+n_2+1)/12}} \dot{\sim} \mathcal{N}(0,1) \text{ under } H_0

where R1R_1 is the sum of ranks in group 1.

As GLM:

yiN(μi,σ2),μi=β0+β1gi,where yi=rank(yi)y_i^* \sim \mathcal{N}(\mu_i, \sigma^2), \quad \mu_i = \beta_0 + \beta_1 g_i, \quad \text{where } y_i^* = \text{rank}(y_i)

t=β^1SE(β^1)t(n1+n22) under H0:β1=0t = \frac{\hat{\beta}_1}{\text{SE}(\hat{\beta}_1)} \sim t(n_1+n_2-2) \text{ under } H_0: \beta_1 = 0

The rank transformation makes inference robust to outliers and non-normality. The GLM replaces the UU statistic with a tt-test on ranks — both test the same null of equal group locations.

scipy

from scipy.stats import mannwhitneyu

male_mass = penguins.filter(pl.col("sex") == "male")["body_mass_g"].to_numpy()
female_mass = penguins.filter(pl.col("sex") == "female")["body_mass_g"].to_numpy()

scipy_mw = mannwhitneyu(male_mass, female_mass, alternative='two-sided')
scipy_mw
MannwhitneyuResult(statistic=np.float64(20845.5), pvalue=np.float64(1.8133343032461053e-15))

bossanova

m_mw = model("rank(body_mass_g) ~ sex", penguins).fit().infer()

m_mw.params[1].select("statistic", "p_value")
Loading...

scipy reports the Mann-Whitney U statistic; bossanova reports a t-statistic on ranks. The test statistics differ but both test H₀: equal group locations and yield equivalent p-values.