Theorem Reference

Mathematical Theorems¶

This page documents the mathematical theorems verified by bossanova’s property-based tests. Each theorem is a statement that must hold for any valid input, verified automatically by Hypothesis across thousands of random test cases.

How to Read This Page¶

Each theorem includes:

Statement: The mathematical property being verified
Intuition: Why this property matters (click to expand)
Depends: Prerequisite theorems that must hold first
Enables: Downstream theorems that build on this one
Test: The actual test function that verifies this property
Reference: Academic citations for further reading

Theorem Dependency Graph¶

The following diagram shows how theorems build on each other. Arrows indicate dependencies (A → B means “A is required for B”).

Linear Algebra (LA)¶

Matrix decompositions and fundamental operations

LA.4: $\mathbf{X} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top$ (SVD reconstruction)¶

The singular value decomposition $\mathbf{X} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top$ always exists and allows exact reconstruction of the original matrix.

Enables: LA.5, OLS.1

Test: test_linalg_hypothesis.py::TestSVDReconstruction::test_svd_reconstruction

Reference: Golub & Van Loan, 2013

LA.5: $\mathbf{X}\mathbf{X}^+\mathbf{X}$ = X (Moore-Penrose property 1)¶

The pseudoinverse X $^+$ acts as a “generalized inverse”: applying X $^+$ and then X returns to the original value (on the column space).

Depends: LA.4

Enables: OLS.1, LMM.1

Test: test_linalg_hypothesis.py::TestMoorePenrosePseudoinverse::test_moore_penrose_property_1

Reference: Golub & Van Loan, 2013

LA.7: $\log|\mathbf{A}|$ = 2 $\sum\log$ ( $\text{diag}(\mathbf{L})$ ) where $\mathbf{A} = \mathbf{L}\mathbf{L}^\top$ ¶

For positive definite A with Cholesky factor L:

det(A) = det(L)det( $\mathbf{L}^\top$ ) = det(L)² = ( $\prod$ $L_{ii}$ )²
$\log|\mathbf{A}|$ = 2· $\log|\mathbf{L}|$ = 2· $\sum\log$ ( $L_{ii}$ )

Depends: LA.3

Enables: LMM.4

Test: test_linalg_hypothesis.py::TestLogDeterminantViaCholesky::test_logdet_cholesky_formula

Reference: Golub & Van Loan, 2013

Ordinary Least Squares (OLS)¶

Normal equations and solution properties

OLS.1: $\mathbf{X}^\top(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})$ = 0 via QR solve¶

At the OLS solution, $\mathbf{X}^\top(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})$ = 0.

Depends: LA.1, LA.2

Enables: OLS.5, DX.1

Test: test_linalg_hypothesis.py::TestNormalEquations::test_normal_equations_qr

Reference: Hastie et al., 2009

OLS.4: Each column of X is orthogonal to residuals¶

This is equivalent to OLS.1 but stated column-by-column.

Depends: OLS.1

Test: test_linalg_hypothesis.py::TestResidualOrthogonality::test_residuals_orthogonal_to_columns

Reference: Greene, 2018

OLS.5: $\mathbf{H}^2$ = H (projection property)¶

A projection matrix satisfies $\mathbf{H}^2$ = H. This is because projecting twice onto the same subspace gives the same result as projecting once.

Test: test_diagnostics_hypothesis.py::TestHatMatrixProperties::test_hat_matrix_idempotent

OLS.6: $\text{tr}(\mathbf{H})$ = p (trace equals number of parameters)¶

The trace of a projection matrix equals its rank. For full-rank X, this equals the number of columns p.

Depends: OLS.5

Enables: DX.1

Test: test_diagnostics_hypothesis.py::TestHatMatrixProperties::test_hat_matrix_trace_equals_p

Reference: Hastie et al., 2009

OLS.7: Eigenvalues of H are 0 or 1¶

A projection matrix has only eigenvalues 0 and 1. For a rank-p projection in n-space, there are p eigenvalues equal to 1 and (n-p) eigenvalues equal to 0.

Depends: OLS.5

Test: test_diagnostics_hypothesis.py::TestHatMatrixProperties::test_hat_matrix_eigenvalues_binary

Reference: Golub & Van Loan, 2013

Weighted Least Squares (WLS)¶

IRLS and GLM convergence

WLS.1: $\mathbf{X}^\top\mathbf{W}$ ( $\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}$ ) = 0 for weighted least squares¶

At the WLS solution, the weighted score $\mathbf{X}^\top\mathbf{W}$ ( $\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}$ ) = 0.

Depends: LA.1

Enables: WLS.2, INF.1

Test: test_glm_hypothesis.py::TestWeightedNormalEquations::test_weighted_normal_equations

Reference: McCullagh & Nelder, 1989

WLS.3: $\mathbf{X}^\top(\mathbf{y} - \boldsymbol{\mu})$ $\approx$ 0 at convergence for binomial GLM¶

At the MLE, the score (gradient of log-likelihood) equals zero. For binomial GLM with canonical logit link: score = $\mathbf{X}^\top(\mathbf{y} - \boldsymbol{\mu})$ .

Note: The general GLM score involves weights, but for canonical link the weights cancel, giving the simple form $\mathbf{X}^\top(\mathbf{y} - \boldsymbol{\mu})$ .

Test: test_glm_hypothesis.py::TestScoreEquationsAtConvergence::test_binomial_score_at_convergence

Reference: McCullagh & Nelder, 1989

Inference (INF)¶

Standard errors, confidence intervals, and tests

INF.1: SE = √ $\text{diag}(\mathbf{V})$ ¶

Standard errors are the square roots of the diagonal elements of the variance-covariance matrix.

Enables: INF.2, INF.4

Test: test_inference_hypothesis.py::TestStandardErrorsFromVCOV::test_se_equals_sqrt_diag_vcov

Reference: Greene, 2018

INF.2: (CI_lo + CI_hi)/2 = $\hat{\boldsymbol{\beta}}$ ¶

Confidence intervals based on symmetric distributions (t or z) are centered at the point estimate.

Depends: INF.1

Enables: EMM.1, INF.4

Test: test_inference_hypothesis.py::TestConfidenceIntervalSymmetry::test_ci_midpoint_equals_estimate

Reference: Casella & Berger, 2002

INF.4: W = (L $\beta$ )’ @ ( $\mathbf{L}\mathbf{V}\mathbf{L}^\top$ )^{-1} @ (L $\beta$ )¶

The Wald statistic for testing H0: L $\beta$ = 0 is a quadratic form in the contrast estimates, weighted by the inverse contrast variance.

Depends: EMM.2

Enables: INF.5

Test: test_emm_hypothesis.py::TestWaldStatistic::test_wald_formula

Reference: Greene, 2018

INF.5: F = W / q¶

The F-statistic is the Wald statistic divided by the number of constraints (rows in the contrast matrix).

Depends: INF.4

Test: test_emm_hypothesis.py::TestFTestFromWald::test_f_equals_wald_over_q

Reference: Greene, 2018

Diagnostics (DX)¶

Leverage, residuals, and influence measures

DX.1: $\sum$ h_i$$ = p (trace of hat matrix equals rank)¶

The hat matrix H = X( $\mathbf{X}^\top\mathbf{X}$ )⁻¹ $\mathbf{X}^\top$ is a projection matrix onto the column space of X. Its trace equals the rank of X.

Depends: OLS.1, LA.1

Enables: DX.3, DX.4

Test: test_diagnostics_hypothesis.py::TestLeverageSum::test_leverage_sum_equals_p

Reference: Cook & Weisberg, 1982

DX.2: $h_i$ $\leq$ 1 for all observations¶

Since H is a projection matrix ( $\mathbf{H}^2$ = H), its eigenvalues are 0 or 1. The diagonal elements (leverages) are bounded by the largest eigenvalue.

Test: test_diagnostics_hypothesis.py::TestLeverageBounds::test_leverage_upper_bound

Reference: Hastie et al., 2009

DX.3: $D_i$ = ( $e_i$ ²/p $\sigma^2$ )( $h_i$ /(1- $h_i$ )²)¶

Cook’s distance measures the influence of observation i on all fitted values. It combines residual magnitude and leverage.

Depends: DX.1, DX.2

Test: test_diagnostics_hypothesis.py::TestCooksDistanceFormula::test_cooks_distance_formula

Reference: Cook & Weisberg, 1982

DX.4: $r_i$ = $e_i$ / ( $\sigma$ √(1- $h_i$ ))¶

Studentized (internally standardized) residuals adjust raw residuals for their expected variance.

Depends: DX.1, OLS.1

Test: test_diagnostics_hypothesis.py::TestStudentizedResidualsFormula::test_studentized_residuals_formula

Reference: Cook & Weisberg, 1982

Estimated Marginal Means (EMM)¶

EMM computation and variance propagation

EMM.1: EMMs = $\mathbf{X}_{\text{ref}}$ @ $\beta$ ¶

Estimated marginal means are computed as linear combinations of the coefficient vector using the prediction matrix.

Depends: OLS.1

Enables: EMM.2

Test: test_emm_hypothesis.py::TestEMMLinearPrediction::test_emm_equals_xref_times_coef

Reference: Searle et al., 1980

EMM.2: $\text{Var}($ $\mathbf{X}_{\text{ref}}$ @ $\beta$ ) = $\mathbf{X}_{\text{ref}}$ @ V @ $\mathbf{X}_{\text{ref}}$ ’¶

The variance-covariance of EMMs is computed by propagating the coefficient variance through the linear transformation.

Depends: EMM.1

Enables: INF.4

Test: test_emm_hypothesis.py::TestEMMVariancePropagation::test_vcov_emm_formula

Reference: Greene, 2018

References¶

Golub, G. H., & Van Loan, C. F. (2013). Matrix Computations (4th ed.). Johns Hopkins University Press.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer. https://hastie.su.domains/ElemStatLearn/
Greene, W. H. (2018). Econometric Analysis (8th ed.). Pearson.
McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). Chapman & Hall.
Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury/Thomson Learning.
Cook, R. D., & Weisberg, S. (1982). Residuals and Influence in Regression. Chapman.
Searle, S. R., Speed, F. M., & Milliken, G. A. (1980). Population Marginal Means in the Linear Model: An Alternative to Least Squares Means. The American Statistician, 34(4), 216–221. 10.1080/00031305.1980.10483031

Mathematical Theorems¶

How to Read This Page¶

Theorem Dependency Graph¶

Linear Algebra (LA)¶

LA.4: X=UΣV⊤\mathbf{X} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\topX=UΣV⊤ (SVD reconstruction)¶

LA.5: XX+X\mathbf{X}\mathbf{X}^+\mathbf{X}XX+X = X (Moore-Penrose property 1)¶

LA.7: log⁡∣A∣\log|\mathbf{A}|log∣A∣ = 2∑log⁡\sum\log∑log(diag(L)\text{diag}(\mathbf{L})diag(L)) where A=LL⊤\mathbf{A} = \mathbf{L}\mathbf{L}^\topA=LL⊤¶

Ordinary Least Squares (OLS)¶

OLS.1: X⊤(y−Xβ^)\mathbf{X}^\top(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})X⊤(y−Xβ^​) = 0 via QR solve¶

OLS.4: Each column of X is orthogonal to residuals¶

OLS.5: H2\mathbf{H}^2H2 = H (projection property)¶

OLS.6: tr(H)\text{tr}(\mathbf{H})tr(H) = p (trace equals number of parameters)¶

OLS.7: Eigenvalues of H are 0 or 1¶

Weighted Least Squares (WLS)¶

WLS.1: X⊤W\mathbf{X}^\top\mathbf{W}X⊤W(y−Xβ^\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}y−Xβ^​) = 0 for weighted least squares¶

WLS.3: X⊤(y−μ)\mathbf{X}^\top(\mathbf{y} - \boldsymbol{\mu})X⊤(y−μ) ≈\approx≈ 0 at convergence for binomial GLM¶

Inference (INF)¶

INF.1: SE = √diag(V)\text{diag}(\mathbf{V})diag(V)¶

INF.2: (CI_lo + CI_hi)/2 = β^\hat{\boldsymbol{\beta}}β^​¶

INF.4: W = (Lβ\betaβ)’ @ (LVL⊤\mathbf{L}\mathbf{V}\mathbf{L}^\topLVL⊤)^{-1} @ (Lβ\betaβ)¶

INF.5: F = W / q¶

Diagnostics (DX)¶

DX.1: ∑\sum ∑h_i$$ = p (trace of hat matrix equals rank)¶

DX.2: hih_ihi​ ≤\leq≤ 1 for all observations¶

DX.3: DiD_iDi​ = (eie_iei​²/pσ2\sigma^2σ2)(hih_ihi​/(1-hih_ihi​)²)¶

DX.4: rir_iri​ = eie_iei​ / (σ\sigmaσ√(1-hih_ihi​))¶

Estimated Marginal Means (EMM)¶

EMM.1: EMMs = Xref\mathbf{X}_{\text{ref}}Xref​ @ β\betaβ¶

EMM.2: Var(\text{Var}(Var(Xref\mathbf{X}_{\text{ref}}Xref​ @ β\betaβ) = Xref\mathbf{X}_{\text{ref}}Xref​ @ V @ Xref\mathbf{X}_{\text{ref}}Xref​’¶

References¶

LA.4: $\mathbf{X} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top$ (SVD reconstruction)¶

LA.5: $\mathbf{X}\mathbf{X}^+\mathbf{X}$ = X (Moore-Penrose property 1)¶

LA.7: $\log|\mathbf{A}|$ = 2 $\sum\log$ ( $\text{diag}(\mathbf{L})$ ) where $\mathbf{A} = \mathbf{L}\mathbf{L}^\top$ ¶

OLS.1: $\mathbf{X}^\top(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})$ = 0 via QR solve¶

OLS.5: $\mathbf{H}^2$ = H (projection property)¶

OLS.6: $\text{tr}(\mathbf{H})$ = p (trace equals number of parameters)¶

WLS.1: $\mathbf{X}^\top\mathbf{W}$ ( $\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}$ ) = 0 for weighted least squares¶

WLS.3: $\mathbf{X}^\top(\mathbf{y} - \boldsymbol{\mu})$ $\approx$ 0 at convergence for binomial GLM¶

INF.1: SE = √ $\text{diag}(\mathbf{V})$ ¶

INF.2: (CI_lo + CI_hi)/2 = $\hat{\boldsymbol{\beta}}$ ¶

INF.4: W = (L $\beta$ )’ @ ( $\mathbf{L}\mathbf{V}\mathbf{L}^\top$ )^{-1} @ (L $\beta$ )¶

DX.1: $\sum$ h_i$$ = p (trace of hat matrix equals rank)¶

DX.2: $h_i$ $\leq$ 1 for all observations¶

DX.3: $D_i$ = ( $e_i$ ²/p $\sigma^2$ )( $h_i$ /(1- $h_i$ )²)¶

DX.4: $r_i$ = $e_i$ / ( $\sigma$ √(1- $h_i$ ))¶

EMM.1: EMMs = $\mathbf{X}_{\text{ref}}$ @ $\beta$ ¶

EMM.2: $\text{Var}($ $\mathbf{X}_{\text{ref}}$ @ $\beta$ ) = $\mathbf{X}_{\text{ref}}$ @ V @ $\mathbf{X}_{\text{ref}}$ ’¶