Mathematical Theorems¶
This page documents the mathematical theorems verified by bossanova’s property-based tests. Each theorem is a statement that must hold for any valid input, verified automatically by Hypothesis across thousands of random test cases.
How to Read This Page¶
Each theorem includes:
Statement: The mathematical property being verified
Intuition: Why this property matters (click to expand)
Depends: Prerequisite theorems that must hold first
Enables: Downstream theorems that build on this one
Test: The actual test function that verifies this property
Reference: Academic citations for further reading
Theorem Dependency Graph¶
The following diagram shows how theorems build on each other. Arrows indicate dependencies (A → B means “A is required for B”).
Linear Algebra (LA)¶
Matrix decompositions and fundamental operations
LA.4: (SVD reconstruction)¶
The singular value decomposition always exists and allows exact reconstruction of the original matrix.
💡 Intuition
SVD decomposes X into orthogonal matrices U, V and diagonal singular values S. The product reconstructs X exactly (up to numerical precision).
Test: test_linalg_hypothesis.py::TestSVDReconstruction::test_svd_reconstruction
Reference: Golub & Van Loan, 2013
LA.5: = X (Moore-Penrose property 1)¶
The pseudoinverse X acts as a “generalized inverse”: applying X and then X returns to the original value (on the column space).
💡 Intuition
X inverts X on its column space and projects to zero on the null space. Applying X again maps back to the column space, recovering X exactly.
Depends: LA.4
Enables: OLS.1, LMM.1
Test: test_linalg_hypothesis.py::TestMoorePenrosePseudoinverse::test_moore_penrose_property_1
Reference: Golub & Van Loan, 2013
LA.7: = 2() where ¶
For positive definite A with Cholesky factor L:
det(A) = det(L)det() = det(L)² = ()²
= 2· = 2·()
💡 Intuition
The Cholesky decomposition transforms the determinant computation from O(n³) general to O(n) after factorization. The product of diagonal elements of L gives , and squaring gives .
Depends: LA.3
Enables: LMM.4
Test: test_linalg_hypothesis.py::TestLogDeterminantViaCholesky::test_logdet_cholesky_formula
Reference: Golub & Van Loan, 2013
Ordinary Least Squares (OLS)¶
Normal equations and solution properties
OLS.1: = 0 via QR solve¶
At the OLS solution, = 0.
💡 Intuition
Residuals represent unexplained variation. Zero correlation with X means we’ve extracted all linear signal. Geometrically: OLS projects y onto , so residuals are perpendicular to this subspace.
Depends: LA.1, LA.2
Test: test_linalg_hypothesis.py::TestNormalEquations::test_normal_equations_qr
Reference: Hastie et al., 2009
OLS.4: Each column of X is orthogonal to residuals¶
This is equivalent to OLS.1 but stated column-by-column.
💡 Intuition
If residuals were correlated with any predictor, we could improve the fit by adjusting that coefficient. Orthogonality means the solution is optimal.
Depends: OLS.1
Test: test_linalg_hypothesis.py::TestResidualOrthogonality::test_residuals_orthogonal_to_columns
Reference: Greene, 2018
OLS.5: = H (projection property)¶
A projection matrix satisfies = H. This is because projecting twice onto the same subspace gives the same result as projecting once.
💡 Intuition
If = Hy, then H = H(Hy) = y = Hy = . Re-projecting the fitted values doesn’t change them.
Test: test_diagnostics_hypothesis.py::TestHatMatrixProperties::test_hat_matrix_idempotent
OLS.6: = p (trace equals number of parameters)¶
The trace of a projection matrix equals its rank. For full-rank X, this equals the number of columns p.
💡 Intuition
The trace counts the dimension of the projection subspace. Since H projects onto , which has dimension p when X has full column rank, = p.
Depends: OLS.5
Enables: DX.1
Test: test_diagnostics_hypothesis.py::TestHatMatrixProperties::test_hat_matrix_trace_equals_p
Reference: Hastie et al., 2009
OLS.7: Eigenvalues of H are 0 or 1¶
A projection matrix has only eigenvalues 0 and 1. For a rank-p projection in n-space, there are p eigenvalues equal to 1 and (n-p) eigenvalues equal to 0.
💡 Intuition
H projects vectors onto a p-dimensional subspace. Vectors in the subspace are unchanged (eigenvalue 1), vectors orthogonal to it are annihilated (eigenvalue 0).
Depends: OLS.5
Test: test_diagnostics_hypothesis.py::TestHatMatrixProperties::test_hat_matrix_eigenvalues_binary
Reference: Golub & Van Loan, 2013
Weighted Least Squares (WLS)¶
IRLS and GLM convergence
WLS.1: () = 0 for weighted least squares¶
At the WLS solution, the weighted score () = 0.
💡 Intuition
The weighted residuals should have zero correlation with X when weighted by W. This is the optimality condition for WLS.
Depends: LA.1
Enables: WLS.2, INF.1
Test: test_glm_hypothesis.py::TestWeightedNormalEquations::test_weighted_normal_equations
Reference: McCullagh & Nelder, 1989
WLS.3: 0 at convergence for binomial GLM¶
At the MLE, the score (gradient of log-likelihood) equals zero. For binomial GLM with canonical logit link: score = .
Note: The general GLM score involves weights, but for canonical link the weights cancel, giving the simple form .
💡 Intuition
The IRLS algorithm minimizes deviance by iteratively solving weighted least squares. At convergence, the gradient is zero.
Test: test_glm_hypothesis.py::TestScoreEquationsAtConvergence::test_binomial_score_at_convergence
Reference: McCullagh & Nelder, 1989
Inference (INF)¶
Standard errors, confidence intervals, and tests
INF.1: SE = √¶
Standard errors are the square roots of the diagonal elements of the variance-covariance matrix.
💡 Intuition
The diagonal of VCOV contains ), and = (). This is the fundamental relationship between the covariance matrix and standard errors.
Test: test_inference_hypothesis.py::TestStandardErrorsFromVCOV::test_se_equals_sqrt_diag_vcov
Reference: Greene, 2018
INF.2: (CI_lo + CI_hi)/2 = ¶
Confidence intervals based on symmetric distributions (t or z) are centered at the point estimate.
💡 Intuition
The CI is constructed as ± critical × SE. The midpoint of [ - c×SE, + c×SE] is exactly .
Depends: INF.1
Test: test_inference_hypothesis.py::TestConfidenceIntervalSymmetry::test_ci_midpoint_equals_estimate
Reference: Casella & Berger, 2002
INF.4: W = (L)’ @ ()^{-1} @ (L)¶
The Wald statistic for testing H0: L = 0 is a quadratic form in the contrast estimates, weighted by the inverse contrast variance.
💡 Intuition
The Wald statistic measures how many standard errors away from zero the contrast estimates are, accounting for their correlations.
Depends: EMM.2
Enables: INF.5
Test: test_emm_hypothesis.py::TestWaldStatistic::test_wald_formula
Reference: Greene, 2018
INF.5: F = W / q¶
The F-statistic is the Wald statistic divided by the number of constraints (rows in the contrast matrix).
💡 Intuition
The Wald statistic is approximately chi-squared with q df. Dividing by q gives an F-distributed statistic.
Depends: INF.4
Test: test_emm_hypothesis.py::TestFTestFromWald::test_f_equals_wald_over_q
Reference: Greene, 2018
Diagnostics (DX)¶
Leverage, residuals, and influence measures
DX.1: h_i$$ = p (trace of hat matrix equals rank)¶
The hat matrix H = X()⁻¹ is a projection matrix onto the column space of X. Its trace equals the rank of X.
💡 Intuition
Each leverage measures how much observation i influences its own fitted value. The sum of all leverages equals the “effective number of parameters” being estimated.
Depends: OLS.1, LA.1
Test: test_diagnostics_hypothesis.py::TestLeverageSum::test_leverage_sum_equals_p
Reference: Cook & Weisberg, 1982
DX.2: 1 for all observations¶
Since H is a projection matrix ( = H), its eigenvalues are 0 or 1. The diagonal elements (leverages) are bounded by the largest eigenvalue.
💡 Intuition
A leverage of 1 would mean the observation completely determines its own fitted value. This is only possible if the observation lies exactly on a basis vector.
Test: test_diagnostics_hypothesis.py::TestLeverageBounds::test_leverage_upper_bound
Reference: Hastie et al., 2009
DX.3: = (²/p)(/(1-)²)¶
Cook’s distance measures the influence of observation i on all fitted values. It combines residual magnitude and leverage.
💡 Intuition
Large residuals OR high leverage can produce large Cook’s distance. An influential point either pulls the fit strongly (high leverage) or fits poorly (large residual), or both.
Test: test_diagnostics_hypothesis.py::TestCooksDistanceFormula::test_cooks_distance_formula
Reference: Cook & Weisberg, 1982
DX.4: = / (√(1-))¶
Studentized (internally standardized) residuals adjust raw residuals for their expected variance.
💡 Intuition
High leverage points have smaller expected residual variance because they influence the fit more. Studentizing corrects for this to put all residuals on comparable scale.
Test: test_diagnostics_hypothesis.py::TestStudentizedResidualsFormula::test_studentized_residuals_formula
Reference: Cook & Weisberg, 1982
Estimated Marginal Means (EMM)¶
EMM computation and variance propagation
EMM.1: EMMs = @ ¶
Estimated marginal means are computed as linear combinations of the coefficient vector using the prediction matrix.
💡 Intuition
encodes the covariate values at each reference grid point. The EMM at point i is simply the linear predictor [i] @ .
Depends: OLS.1
Enables: EMM.2
Test: test_emm_hypothesis.py::TestEMMLinearPrediction::test_emm_equals_xref_times_coef
Reference: Searle et al., 1980
EMM.2: @ ) = @ V @ ’¶
The variance-covariance of EMMs is computed by propagating the coefficient variance through the linear transformation.
💡 Intuition
This is the standard variance formula for linear combinations: Ax) = A @ x) @ A’. Here A = and x = .
Depends: EMM.1
Enables: INF.4
Test: test_emm_hypothesis.py::TestEMMVariancePropagation::test_vcov_emm_formula
Reference: Greene, 2018
References¶
- Golub, G. H., & Van Loan, C. F. (2013). Matrix Computations (4th ed.). Johns Hopkins University Press.
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer. https://hastie.su.domains/ElemStatLearn/
- Greene, W. H. (2018). Econometric Analysis (8th ed.). Pearson.
- McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). Chapman & Hall.
- Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury/Thomson Learning.
- Cook, R. D., & Weisberg, S. (1982). Residuals and Influence in Regression. Chapman.
- Searle, S. R., Speed, F. M., & Milliken, G. A. (1980). Population Marginal Means in the Linear Model: An Alternative to Least Squares Means. The American Statistician, 34(4), 216–221. 10.1080/00031305.1980.10483031