Regression Diagnostics

Fitting a model is only half the job. A model might produce coefficient estimates, confidence intervals, and p-values --- but none of that means much if a handful of observations are unduly warping the results. Diagnostics tell you whether the model is trustworthy by identifying observations that exert outsized influence on the fit.

This page covers the three core tools for spotting problematic observations: leverage (how extreme is a point’s position in predictor space?), studentized residuals (how poorly does the model predict a point, adjusted for its leverage?), and Cook’s distance (how much do the fitted coefficients change if we remove a point?). Together, they form a complete picture of observation-level influence. Every theorem is backed by a property-based test in bossanova.

Leverage: who pulls the hardest?¶

The problem¶

Not all observations are created equal. A data point far from the center of the predictor space has more influence on the regression line --- it acts as a fulcrum, pulling the fitted line toward itself. Leverage quantifies exactly how much pull each observation has.

Building intuition¶

Recall the hat matrix $\mathbf{H} = \mathbf{X}(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top$ that maps $\mathbf{y}$ to $\hat{\mathbf{y}}$ . The diagonal element $h_i = \mathbf{H}_{ii}$ measures how much observation $i$ ’s $y$ -value influences its own fitted value $\hat{y}_i$ . Points with high leverage dominate their own prediction.

n = 30
x = np.concatenate([np.random.normal(0, 1, n - 1), [10.0]])  # one extreme point
X = np.column_stack([np.ones(n), x])
H = X @ np.linalg.inv(X.T @ X) @ X.T
leverage = np.diag(H)

print(f"Typical leverage: {np.mean(leverage[:-1]):.4f}")
print(f"Extreme point:    {leverage[-1]:.4f}")
print(f"Average leverage:  {np.mean(leverage):.4f}")

Typical leverage: 0.0408
Extreme point:    0.8166
Average leverage:  0.0667

Leverage sum equals p¶

Theorem DX.1: $\sum h_i = p$ where $p$ is the number of parameters.

p = X.shape[1]
print(f"sum(leverage) = {np.sum(leverage):.4f}")
print(f"p             = {p}")
print(f"Match?          {np.isclose(np.sum(leverage), p)}")

sum(leverage) = 2.0000
p             = 2
Match?          True

Leverage bounds¶

Theorem DX.2: $1/n \leq h_i \leq 1$ for all observations.

print(f"Min leverage: {np.min(leverage):.4f}  (lower bound 1/n = {1/n:.4f})")
print(f"Max leverage: {np.max(leverage):.4f}  (upper bound = 1)")
print(f"All in [1/n, 1]? {np.all((leverage >= 1/n - 1e-10) & (leverage <= 1 + 1e-10))}")

Min leverage: 0.0333  (lower bound 1/n = 0.0333)
Max leverage: 0.8166  (upper bound = 1)
All in [1/n, 1]? True

The lower bound $1/n$ means each point gets at least an equal share of the leverage budget. The upper bound 1 means a single point can at most completely determine its own fitted value --- the model passes exactly through it.

Studentized residuals¶

The problem¶

Raw residuals $e_i = y_i - \hat{y}_i$ don’t account for the fact that high-leverage points tend to have smaller residuals. This is counterintuitive but makes sense: high-leverage points pull the regression line toward themselves, which shrinks their own residuals. A small raw residual at a high-leverage point might actually be more surprising than a large residual at a typical point.

Building intuition¶

Studentized residuals adjust each residual by its expected standard deviation, which depends on leverage:

r_i = \frac{e_i}{\hat{\sigma}\sqrt{1 - h_i}}

(1)

The denominator shrinks for high-leverage points (because $h_i$ is larger), which inflates the standardized residual. This puts all residuals on a comparable scale, making genuine outliers easier to spot.

# Generate data with one outlier
y = 3 + 2 * x + np.random.normal(0, 1, n)
y[-1] = 50  # outlier at the extreme x-value

beta = np.linalg.lstsq(X, y, rcond=None)[0]
residuals = y - X @ beta
sigma = np.sqrt(np.sum(residuals**2) / (n - 2))
studentized = residuals / (sigma * np.sqrt(1 - leverage))

print(f"Raw residual (outlier):         {residuals[-1]:.4f}")
print(f"Studentized residual (outlier): {studentized[-1]:.4f}")

Raw residual (outlier):         5.3180
Studentized residual (outlier): 4.9373

The formal statement¶

Theorem DX.4: $r_i = e_i / (\hat{\sigma}\sqrt{1 - h_i})$

Cook’s distance: combining leverage and residuals¶

The problem¶

A point can be influential in two ways: it can have high leverage (extreme predictor values) or a large residual (doesn’t fit the pattern), or both. Neither measure alone tells the full story. Cook’s distance combines them into a single influence measure.

Building intuition¶

D_i = \frac{e_i^2}{p \hat{\sigma}^2} \cdot \frac{h_i}{(1 - h_i)^2}

(2)

The first factor captures residual magnitude. The second captures leverage. Large Cook’s distance means the point substantially changes the fitted coefficients when removed. A point needs both a notable residual and some leverage to be truly influential --- a large residual at a low-leverage point barely moves the line, and a high-leverage point that fits the pattern well doesn’t distort the fit.

cooks_d = (residuals**2 / (2 * sigma**2)) * (leverage / (1 - leverage)**2)
print(f"Cook's D (outlier):  {cooks_d[-1]:.4f}")
print(f"Cook's D (typical):  {np.median(cooks_d[:-1]):.4f}")

Cook's D (outlier):  54.2664
Cook's D (typical):  0.0060

The formal statement¶

Theorem DX.3: $D_i = \frac{e_i^2}{p\hat{\sigma}^2} \cdot \frac{h_i}{(1 - h_i)^2}$

Seeing it in bossanova¶

from bossanova.internal.maths.inference import (
    compute_leverage,
    compute_cooks_distance,
    compute_studentized_residuals,
)

XtX_inv = np.linalg.inv(X.T @ X)
h = compute_leverage(X, weights=None, XtWX_inv=XtX_inv)
mse = np.sum(residuals**2) / (n - X.shape[1])
sigma_hat = np.sqrt(mse)
cooks = compute_cooks_distance(residuals, h, sigma_hat, X.shape[1])
stud_r = compute_studentized_residuals(residuals, h, sigma_hat)

print(f"Max leverage: {np.max(h):.4f} (observation {np.argmax(h)})")
print(f"Max Cook's D: {np.max(cooks):.4f} (observation {np.argmax(cooks)})")
print(f"Max |studentized residual|: {np.max(np.abs(stud_r)):.4f}")

Max leverage: 0.8166 (observation 29)
Max Cook's D: 54.2664 (observation 29)
Max |studentized residual|: 4.9373

Putting it together: identifying influential observations¶

In practice, you don’t look at one diagnostic in isolation. The standard approach is to flag observations that exceed common thresholds, then investigate the flagged points to decide whether they are data errors, genuinely unusual cases, or signs that the model is misspecified.

# Common diagnostic thresholds
threshold_leverage = 2 * X.shape[1] / n  # 2p/n rule of thumb
threshold_cooks = 4 / n
threshold_resid = 2.0  # |r| > 2

flagged_leverage = np.where(leverage > threshold_leverage)[0]
flagged_cooks = np.where(cooks_d > threshold_cooks)[0]
flagged_resid = np.where(np.abs(studentized) > threshold_resid)[0]

print(f"High leverage (>{threshold_leverage:.2f}): observations {flagged_leverage}")
print(f"High Cook's D (>{threshold_cooks:.2f}):    observations {flagged_cooks}")
print(f"Large residual (>2.0):          observations {flagged_resid}")

High leverage (>0.13): observations [29]
High Cook's D (>0.13):    observations [29]
Large residual (>2.0):          observations [ 6 20 29]

Summary¶

Theorem	Statement	What it means
DX.1	$\sum h_i = p$	Total leverage is fixed
DX.2	$1/n \leq h_i \leq 1$	Leverage is bounded
DX.3	Cook’s D formula	Combines residual and leverage
DX.4	Studentized residual formula	Adjusts residuals for leverage