Classical Test bossanova Equivalent Relationship Type Pearson correlation model("zscore(y) ~ zscore(x)", df)Linear Spearman correlation model("zscore(rank(y)) ~ zscore(rank(x))", df)Monotonic
When both variables are standardized, the regression slope equals the correlation coefficient. bossanova provides zscore() and other expressions for common data transformations.
import numpy as np
import polars as pl
from bossanova.model import model
from bossanova import load_dataset
np.random.seed(42)
# Load penguins dataset
penguins = load_dataset("penguins").drop_nulls()
Pearson Correlation ¶ Classical:
r = ∑ i = 1 n ( x i − x ˉ ) ( y i − y ˉ ) ∑ i = 1 n ( x i − x ˉ ) 2 ∑ i = 1 n ( y i − y ˉ ) 2 , t = r n − 2 1 − r 2 ∼ t ( n − 2 ) under H 0 : ρ = 0 r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \sum_{i=1}^{n}(y_i - \bar{y})^2}}, \quad t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \sim t(n-2) \text{ under } H_0: \rho = 0 r = ∑ i = 1 n ( x i − x ˉ ) 2 ∑ i = 1 n ( y i − y ˉ ) 2 ∑ i = 1 n ( x i − x ˉ ) ( y i − y ˉ ) , t = 1 − r 2 r n − 2 ∼ t ( n − 2 ) under H 0 : ρ = 0 As GLM :
z y ∼ N ( μ , σ 2 ) , μ = β 0 + β 1 z x z_y \sim \mathcal{N}(\mu, \sigma^2), \quad \mu = \beta_0 + \beta_1 z_x z y ∼ N ( μ , σ 2 ) , μ = β 0 + β 1 z x
β ^ 1 = r , t β 1 = t r \hat{\beta}_1 = r, \quad t_{\beta_1} = t_r β ^ 1 = r , t β 1 = t r where z x z_x z x and z y z_y z y are standardized variables. When both variables are z-scored, the slope equals r r r and the t t t -tests are identical.
scipy ¶ from scipy.stats import pearsonr
scipy_pearson = pearsonr(penguins["bill_length_mm"].to_numpy(), penguins["flipper_length_mm"].to_numpy())
scipy_pearsonPearsonRResult(statistic=np.float64(0.6561813407464278), pvalue=np.float64(1.7439736176207624e-43))
bossanova ¶ m = model("zscore(flipper_length_mm) ~ zscore(bill_length_mm)", penguins).fit().infer()
m.params.select("estimate", "statistic", "p_value")bn_slope = float(m.params["estimate"][1])
bn_p = float(m.params["p_value"][1])
assert np.isclose(bn_slope, scipy_pearson.statistic, rtol=1e-3), f"r mismatch: {bn_slope} vs {scipy_pearson.statistic}"
assert np.isclose(bn_p, scipy_pearson.pvalue, rtol=1e-3), f"p mismatch: {bn_p} vs {scipy_pearson.pvalue}"
Spearman Rank Correlation ¶ Classical:
ρ = r ( rank ( x ) , rank ( y ) ) , t = ρ n − 2 1 − ρ 2 ∼ t ( n − 2 ) under H 0 : ρ s = 0 \rho = r(\text{rank}(x), \text{rank}(y)), \quad t = \frac{\rho\sqrt{n-2}}{\sqrt{1-\rho^2}} \sim t(n-2) \text{ under } H_0: \rho_s = 0 ρ = r ( rank ( x ) , rank ( y )) , t = 1 − ρ 2 ρ n − 2 ∼ t ( n − 2 ) under H 0 : ρ s = 0 Pearson correlation applied to ranks.
As GLM :
z rank ( y ) ∼ N ( μ , σ 2 ) , μ = β 0 + β 1 z rank ( x ) z_{\text{rank}(y)} \sim \mathcal{N}(\mu, \sigma^2), \quad \mu = \beta_0 + \beta_1 z_{\text{rank}(x)} z rank ( y ) ∼ N ( μ , σ 2 ) , μ = β 0 + β 1 z rank ( x )
β ^ 1 = ρ , t β 1 = t ρ \hat{\beta}_1 = \rho, \quad t_{\beta_1} = t_\rho β ^ 1 = ρ , t β 1 = t ρ Spearman correlation captures monotonic relationships and is robust to outliers and non-linearity. The same GLM trick applies — z-score the ranks, and the slope equals ρ \rho ρ .
scipy ¶ from scipy.stats import spearmanr
scipy_spearman = spearmanr(penguins["bill_length_mm"].to_numpy(), penguins["flipper_length_mm"].to_numpy())
scipy_spearmanSignificanceResult(statistic=np.float64(0.6727719416255543), pvalue=np.float64(2.0669356276079203e-46))
bossanova ¶ m_spearman = model("zscore(rank(flipper_length_mm)) ~ zscore(rank(bill_length_mm))", penguins).fit().infer()
m_spearman.params.select("estimate", "statistic", "p_value")bn_rho = float(m_spearman.params["estimate"][1])
bn_p_sp = float(m_spearman.params["p_value"][1])
# With zscore(rank()), the slope equals Spearman's rho exactly
assert np.isclose(bn_rho, scipy_spearman.statistic, rtol=1e-3), f"rho mismatch: {bn_rho} vs {scipy_spearman.statistic}"
assert np.isclose(bn_p_sp, scipy_spearman.pvalue, rtol=1e-3), f"p mismatch: {bn_p_sp} vs {scipy_spearman.pvalue}"