Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Theorem Reference

UCSD Psychology

Mathematical Theorems

This page documents the mathematical theorems verified by bossanova’s property-based tests. Each theorem is a statement that must hold for any valid input, verified automatically by Hypothesis across thousands of random test cases.

How to Read This Page

Each theorem includes:

Theorem Dependency Graph

The following diagram shows how theorems build on each other. Arrows indicate dependencies (A → B means “A is required for B”).

Linear Algebra (LA)

Matrix decompositions and fundamental operations

LA.4: X=UΣV\mathbf{X} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top (SVD reconstruction)

The singular value decomposition X=UΣV\mathbf{X} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top always exists and allows exact reconstruction of the original matrix.

💡 Intuition

SVD decomposes X into orthogonal matrices U, V and diagonal singular values S. The product UΣV\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top reconstructs X exactly (up to numerical precision).

Enables: LA.5, OLS.1

Test: test_linalg_hypothesis.py::TestSVDReconstruction::test_svd_reconstruction

Reference: Golub & Van Loan, 2013


LA.5: XX+X\mathbf{X}\mathbf{X}^+\mathbf{X} = X (Moore-Penrose property 1)

The pseudoinverse X+^+ acts as a “generalized inverse”: applying X+^+ and then X returns to the original value (on the column space).

💡 Intuition

X+^+ inverts X on its column space and projects to zero on the null space. Applying X again maps back to the column space, recovering X exactly.

Depends: LA.4

Enables: OLS.1, LMM.1

Test: test_linalg_hypothesis.py::TestMoorePenrosePseudoinverse::test_moore_penrose_property_1

Reference: Golub & Van Loan, 2013


LA.7: logA\log|\mathbf{A}| = 2log\sum\log(diag(L)\text{diag}(\mathbf{L})) where A=LL\mathbf{A} = \mathbf{L}\mathbf{L}^\top

For positive definite A with Cholesky factor L:

💡 Intuition

The Cholesky decomposition transforms the determinant computation from O(n³) general to O(n) after factorization. The product of diagonal elements of L gives L|\mathbf{L}|, and squaring gives A|\mathbf{A}|.

Depends: LA.3

Enables: LMM.4

Test: test_linalg_hypothesis.py::TestLogDeterminantViaCholesky::test_logdet_cholesky_formula

Reference: Golub & Van Loan, 2013


Ordinary Least Squares (OLS)

Normal equations and solution properties

OLS.1: X(yXβ^)\mathbf{X}^\top(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}) = 0 via QR solve

At the OLS solution, X(yXβ^)\mathbf{X}^\top(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}) = 0.

💡 Intuition

Residuals represent unexplained variation. Zero correlation with X means we’ve extracted all linear signal. Geometrically: OLS projects y onto col(X)\text{col}(\mathbf{X}), so residuals are perpendicular to this subspace.

Depends: LA.1, LA.2

Enables: OLS.5, DX.1

Test: test_linalg_hypothesis.py::TestNormalEquations::test_normal_equations_qr

Reference: Hastie et al., 2009


OLS.4: Each column of X is orthogonal to residuals

This is equivalent to OLS.1 but stated column-by-column.

💡 Intuition

If residuals were correlated with any predictor, we could improve the fit by adjusting that coefficient. Orthogonality means the solution is optimal.

Depends: OLS.1

Test: test_linalg_hypothesis.py::TestResidualOrthogonality::test_residuals_orthogonal_to_columns

Reference: Greene, 2018


OLS.5: H2\mathbf{H}^2 = H (projection property)

A projection matrix satisfies H2\mathbf{H}^2 = H. This is because projecting twice onto the same subspace gives the same result as projecting once.

💡 Intuition

If y^\hat{\mathbf{y}} = Hy, then Hy^\hat{\mathbf{y}} = H(Hy) = H2\mathbf{H}^2y = Hy = y^\hat{\mathbf{y}}. Re-projecting the fitted values doesn’t change them.

Test: test_diagnostics_hypothesis.py::TestHatMatrixProperties::test_hat_matrix_idempotent


OLS.6: tr(H)\text{tr}(\mathbf{H}) = p (trace equals number of parameters)

The trace of a projection matrix equals its rank. For full-rank X, this equals the number of columns p.

💡 Intuition

The trace counts the dimension of the projection subspace. Since H projects onto col(X)\text{col}(\mathbf{X}), which has dimension p when X has full column rank, tr(H)\text{tr}(\mathbf{H}) = p.

Depends: OLS.5

Enables: DX.1

Test: test_diagnostics_hypothesis.py::TestHatMatrixProperties::test_hat_matrix_trace_equals_p

Reference: Hastie et al., 2009


OLS.7: Eigenvalues of H are 0 or 1

A projection matrix has only eigenvalues 0 and 1. For a rank-p projection in n-space, there are p eigenvalues equal to 1 and (n-p) eigenvalues equal to 0.

💡 Intuition

H projects vectors onto a p-dimensional subspace. Vectors in the subspace are unchanged (eigenvalue 1), vectors orthogonal to it are annihilated (eigenvalue 0).

Depends: OLS.5

Test: test_diagnostics_hypothesis.py::TestHatMatrixProperties::test_hat_matrix_eigenvalues_binary

Reference: Golub & Van Loan, 2013


Weighted Least Squares (WLS)

IRLS and GLM convergence

WLS.1: XW\mathbf{X}^\top\mathbf{W}(yXβ^\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}) = 0 for weighted least squares

At the WLS solution, the weighted score XW\mathbf{X}^\top\mathbf{W}(yXβ^\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}) = 0.

💡 Intuition

The weighted residuals should have zero correlation with X when weighted by W. This is the optimality condition for WLS.

Depends: LA.1

Enables: WLS.2, INF.1

Test: test_glm_hypothesis.py::TestWeightedNormalEquations::test_weighted_normal_equations

Reference: McCullagh & Nelder, 1989


WLS.3: X(yμ)\mathbf{X}^\top(\mathbf{y} - \boldsymbol{\mu}) \approx 0 at convergence for binomial GLM

At the MLE, the score (gradient of log-likelihood) equals zero. For binomial GLM with canonical logit link: score = X(yμ)\mathbf{X}^\top(\mathbf{y} - \boldsymbol{\mu}).

Note: The general GLM score involves weights, but for canonical link the weights cancel, giving the simple form X(yμ)\mathbf{X}^\top(\mathbf{y} - \boldsymbol{\mu}).

💡 Intuition

The IRLS algorithm minimizes deviance by iteratively solving weighted least squares. At convergence, the gradient is zero.

Test: test_glm_hypothesis.py::TestScoreEquationsAtConvergence::test_binomial_score_at_convergence

Reference: McCullagh & Nelder, 1989


Inference (INF)

Standard errors, confidence intervals, and tests

INF.1: SE = √diag(V)\text{diag}(\mathbf{V})

Standard errors are the square roots of the diagonal elements of the variance-covariance matrix.

💡 Intuition

The diagonal of VCOV contains Var(\text{Var}(β^j\hat{\beta}_j), and SEj\text{SE}_j = Var\sqrt{\text{Var}}(β^j\hat{\beta}_j). This is the fundamental relationship between the covariance matrix and standard errors.

Enables: INF.2, INF.4

Test: test_inference_hypothesis.py::TestStandardErrorsFromVCOV::test_se_equals_sqrt_diag_vcov

Reference: Greene, 2018


INF.2: (CI_lo + CI_hi)/2 = β^\hat{\boldsymbol{\beta}}

Confidence intervals based on symmetric distributions (t or z) are centered at the point estimate.

💡 Intuition

The CI is constructed as β^\hat{\boldsymbol{\beta}} ± critical × SE. The midpoint of [β^\hat{\boldsymbol{\beta}} - c×SE, β^\hat{\boldsymbol{\beta}} + c×SE] is exactly β^\hat{\boldsymbol{\beta}}.

Depends: INF.1

Enables: EMM.1, INF.4

Test: test_inference_hypothesis.py::TestConfidenceIntervalSymmetry::test_ci_midpoint_equals_estimate

Reference: Casella & Berger, 2002


INF.4: W = (Lβ\beta)’ @ (LVL\mathbf{L}\mathbf{V}\mathbf{L}^\top)^{-1} @ (Lβ\beta)

The Wald statistic for testing H0: Lβ\beta = 0 is a quadratic form in the contrast estimates, weighted by the inverse contrast variance.

💡 Intuition

The Wald statistic measures how many standard errors away from zero the contrast estimates are, accounting for their correlations.

Depends: EMM.2

Enables: INF.5

Test: test_emm_hypothesis.py::TestWaldStatistic::test_wald_formula

Reference: Greene, 2018


INF.5: F = W / q

The F-statistic is the Wald statistic divided by the number of constraints (rows in the contrast matrix).

💡 Intuition

The Wald statistic is approximately chi-squared with q df. Dividing by q gives an F-distributed statistic.

Depends: INF.4

Test: test_emm_hypothesis.py::TestFTestFromWald::test_f_equals_wald_over_q

Reference: Greene, 2018


Diagnostics (DX)

Leverage, residuals, and influence measures

DX.1: \sum h_i$$ = p (trace of hat matrix equals rank)

The hat matrix H = X(XX\mathbf{X}^\top\mathbf{X})⁻¹X\mathbf{X}^\top is a projection matrix onto the column space of X. Its trace equals the rank of X.

💡 Intuition

Each leverage hih_i measures how much observation i influences its own fitted value. The sum of all leverages equals the “effective number of parameters” being estimated.

Depends: OLS.1, LA.1

Enables: DX.3, DX.4

Test: test_diagnostics_hypothesis.py::TestLeverageSum::test_leverage_sum_equals_p

Reference: Cook & Weisberg, 1982


DX.2: hih_i \leq 1 for all observations

Since H is a projection matrix (H2\mathbf{H}^2 = H), its eigenvalues are 0 or 1. The diagonal elements (leverages) are bounded by the largest eigenvalue.

💡 Intuition

A leverage of 1 would mean the observation completely determines its own fitted value. This is only possible if the observation lies exactly on a basis vector.

Test: test_diagnostics_hypothesis.py::TestLeverageBounds::test_leverage_upper_bound

Reference: Hastie et al., 2009


DX.3: DiD_i = (eie_i²/pσ2\sigma^2)(hih_i/(1-hih_i)²)

Cook’s distance measures the influence of observation i on all fitted values. It combines residual magnitude and leverage.

💡 Intuition

Large residuals OR high leverage can produce large Cook’s distance. An influential point either pulls the fit strongly (high leverage) or fits poorly (large residual), or both.

Depends: DX.1, DX.2

Test: test_diagnostics_hypothesis.py::TestCooksDistanceFormula::test_cooks_distance_formula

Reference: Cook & Weisberg, 1982


DX.4: rir_i = eie_i / (σ\sigma√(1-hih_i))

Studentized (internally standardized) residuals adjust raw residuals for their expected variance.

💡 Intuition

High leverage points have smaller expected residual variance because they influence the fit more. Studentizing corrects for this to put all residuals on comparable scale.

Depends: DX.1, OLS.1

Test: test_diagnostics_hypothesis.py::TestStudentizedResidualsFormula::test_studentized_residuals_formula

Reference: Cook & Weisberg, 1982


Estimated Marginal Means (EMM)

EMM computation and variance propagation

EMM.1: EMMs = Xref\mathbf{X}_{\text{ref}} @ β\beta

Estimated marginal means are computed as linear combinations of the coefficient vector using the prediction matrix.

💡 Intuition

Xref\mathbf{X}_{\text{ref}} encodes the covariate values at each reference grid point. The EMM at point i is simply the linear predictor Xref\mathbf{X}_{\text{ref}}[i] @ β\beta.

Depends: OLS.1

Enables: EMM.2

Test: test_emm_hypothesis.py::TestEMMLinearPrediction::test_emm_equals_xref_times_coef

Reference: Searle et al., 1980


EMM.2: Var(\text{Var}(Xref\mathbf{X}_{\text{ref}} @ β\beta) = Xref\mathbf{X}_{\text{ref}} @ V @ Xref\mathbf{X}_{\text{ref}}

The variance-covariance of EMMs is computed by propagating the coefficient variance through the linear transformation.

💡 Intuition

This is the standard variance formula for linear combinations: Var(\text{Var}(Ax) = A @ Var(\text{Var}(x) @ A’. Here A = Xref\mathbf{X}_{\text{ref}} and x = β\beta.

Depends: EMM.1

Enables: INF.4

Test: test_emm_hypothesis.py::TestEMMVariancePropagation::test_vcov_emm_formula

Reference: Greene, 2018


References

References
  1. Golub, G. H., & Van Loan, C. F. (2013). Matrix Computations (4th ed.). Johns Hopkins University Press.
  2. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer. https://hastie.su.domains/ElemStatLearn/
  3. Greene, W. H. (2018). Econometric Analysis (8th ed.). Pearson.
  4. McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). Chapman & Hall.
  5. Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury/Thomson Learning.
  6. Cook, R. D., & Weisberg, S. (1982). Residuals and Influence in Regression. Chapman.
  7. Searle, S. R., Speed, F. M., & Milliken, G. A. (1980). Population Marginal Means in the Linear Model: An Alternative to Least Squares Means. The American Statistician, 34(4), 216–221. 10.1080/00031305.1980.10483031