Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Correlation Tests

UCSD Psychology
Classical Testbossanova EquivalentRelationship Type
Pearson correlationmodel("zscore(y) ~ zscore(x)", df)Linear
Spearman correlationmodel("zscore(rank(y)) ~ zscore(rank(x))", df)Monotonic

When both variables are standardized, the regression slope equals the correlation coefficient. bossanova provides zscore() and other expressions for common data transformations.

Pearson Correlation

Classical:

r=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2i=1n(yiyˉ)2,t=rn21r2t(n2) under H0:ρ=0r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \sum_{i=1}^{n}(y_i - \bar{y})^2}}, \quad t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \sim t(n-2) \text{ under } H_0: \rho = 0

As GLM:

zyN(μ,σ2),μ=β0+β1zxz_y \sim \mathcal{N}(\mu, \sigma^2), \quad \mu = \beta_0 + \beta_1 z_x

β^1=r,tβ1=tr\hat{\beta}_1 = r, \quad t_{\beta_1} = t_r

where zxz_x and zyz_y are standardized variables. When both variables are z-scored, the slope equals rr and the tt-tests are identical.

scipy

from scipy.stats import pearsonr

scipy_pearson = pearsonr(penguins["bill_length_mm"].to_numpy(), penguins["flipper_length_mm"].to_numpy())
scipy_pearson
PearsonRResult(statistic=np.float64(0.6561813407464278), pvalue=np.float64(1.7439736176207624e-43))

bossanova

m = model("zscore(flipper_length_mm) ~ zscore(bill_length_mm)", penguins).fit().infer()

m.params.select("estimate", "statistic", "p_value")
Loading...

Spearman Rank Correlation

Classical:

ρ=r(rank(x),rank(y)),t=ρn21ρ2t(n2) under H0:ρs=0\rho = r(\text{rank}(x), \text{rank}(y)), \quad t = \frac{\rho\sqrt{n-2}}{\sqrt{1-\rho^2}} \sim t(n-2) \text{ under } H_0: \rho_s = 0

Pearson correlation applied to ranks.

As GLM:

zrank(y)N(μ,σ2),μ=β0+β1zrank(x)z_{\text{rank}(y)} \sim \mathcal{N}(\mu, \sigma^2), \quad \mu = \beta_0 + \beta_1 z_{\text{rank}(x)}

β^1=ρ,tβ1=tρ\hat{\beta}_1 = \rho, \quad t_{\beta_1} = t_\rho

Spearman correlation captures monotonic relationships and is robust to outliers and non-linearity. The same GLM trick applies — z-score the ranks, and the slope equals ρ\rho.

scipy

from scipy.stats import spearmanr

scipy_spearman = spearmanr(penguins["bill_length_mm"].to_numpy(), penguins["flipper_length_mm"].to_numpy())
scipy_spearman
SignificanceResult(statistic=np.float64(0.6727719416255543), pvalue=np.float64(2.0669356276079203e-46))

bossanova

m_spearman = model("zscore(rank(flipper_length_mm)) ~ zscore(rank(bill_length_mm))", penguins).fit().infer()

m_spearman.params.select("estimate", "statistic", "p_value")
Loading...