Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Count Data Tests (Chi-Square)

UCSD Psychology
Classical Testbossanova EquivalentModel Comparison
Chi-square goodness of fitcompare(null, full)Intercept-only vs category model
Chi-square independencecompare(main_effects, interaction)Main effects vs interaction
Multi-way tablesmodel("count ~ a * b * c", df, family="poisson")Hierarchical model comparisons

Poisson GLM with log link provides a unified framework for all chi-square tests via likelihood ratio comparisons using compare().

Chi-Square Goodness of Fit

Classical:

χ2=i=1k(OiEi)2Ei,χ2χ2(k1) under H0 (equal proportions)\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}, \quad \chi^2 \sim \chi^2(k-1) \text{ under } H_0 \text{ (equal proportions)}

As GLM:

yiPoisson(μi),log(μi)=β0+β1x1i+y_i \sim \text{Poisson}(\mu_i), \quad \log(\mu_i) = \beta_0 + \beta_1 x_{1i} + \cdots

G2=2i=1kOilog ⁣(OiE^i)˙χ2(k1) under H0:β1==0G^2 = 2\sum_{i=1}^{k} O_i \log\!\left(\frac{O_i}{\hat{E}_i}\right) \dot{\sim} \chi^2(k-1) \text{ under } H_0: \beta_1 = \cdots = 0

The Pearson χ2\chi^2 and the likelihood ratio G2G^2 are asymptotically equivalent. bossanova reports G2G^2 via compare().

scipy

from scipy.stats import chisquare

scipy_chi2 = chisquare(observed)
scipy_chi2
Power_divergenceResult(statistic=np.float64(28.270270270270274), pvalue=np.float64(7.264217011785267e-07))

bossanova

df = pl.DataFrame({"count": observed, "species": categories})

m_null = model("count ~ 1", df, family="poisson").fit()
m_full = model("count ~ species", df, family="poisson").fit()

compare(m_null, m_full)
Loading...

Chi-Square Test of Independence

Classical:

χ2=i=1rj=1c(OijEij)2Eij,χ2χ2((r1)(c1)) under H0 (independence)\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad \chi^2 \sim \chi^2((r-1)(c-1)) \text{ under } H_0 \text{ (independence)}

As GLM:

yijPoisson(μij),log(μij)=β0+αi+γj+(αγ)ijy_{ij} \sim \text{Poisson}(\mu_{ij}), \quad \log(\mu_{ij}) = \beta_0 + \alpha_i + \gamma_j + (\alpha\gamma)_{ij}

Independence means no interaction term. The LRT comparing the main-effects model (logμij=β0+αi+γj\log\mu_{ij} = \beta_0 + \alpha_i + \gamma_j) to the saturated model tests H0:(αγ)ij=0H_0: (\alpha\gamma)_{ij} = 0 for all i,ji,j.

scipy

from scipy.stats import chi2_contingency

scipy_indep = chi2_contingency(observed_table)
pl.DataFrame({
    "statistic": [scipy_indep.statistic],
    "p_value": [scipy_indep.pvalue],
    "df": [scipy_indep.dof]
})
Loading...

bossanova

# Null model (independence = main effects only)
m_null = model("count ~ species + sex", contingency, family="poisson").fit()
# Full model (saturated, with interaction)
m_full = model("count ~ species * sex", contingency, family="poisson").fit()

compare(m_null, m_full)
Loading...

Multi-Factor Log-Linear Models

For tables with three or more factors, the GLM framework extends naturally—there is no simple classical test equivalent.

bossanova

# Three-way table: species × island × sex
counts_3way = penguins.group_by("species", "island", "sex").agg(pl.len().alias("count"))

m = model("count ~ species + island + sex", counts_3way, family="poisson").fit().infer()

m.params.select("term", "estimate", "p_value")
Loading...