Estimated Marginal Means

After fitting a model, you often want to ask “what’s the predicted outcome for a typical observation in group A vs group B?” When your data has multiple predictors, the raw group means are muddied by differences in covariates across groups. Estimated marginal means (EMMs) answer this by computing predictions at meaningful reference points -- averaging over covariates to give you clean group comparisons that account for the model’s structure.

This page shows how EMMs are computed and how their uncertainty is quantified. The core idea is simple: construct a small matrix of “typical” predictor values (the reference grid), multiply by the fitted coefficients, and propagate uncertainty through the same linear algebra. By the end you’ll see that EMMs and their standard errors are direct applications of the OLS and inference machinery from the earlier pages.

What are marginal means?¶

The problem¶

You’ve fit a model with multiple predictors: y ~ treatment + age + gender. The raw group means of y are confounded by age and gender. If treatment group A happens to be older on average, their higher outcomes might reflect age rather than treatment. You want the treatment effect holding other variables at their means.

Building intuition¶

EMMs construct a reference grid -- specific predictor values at which to evaluate the model. For categorical variables, you cycle through all levels. For continuous variables, you typically hold them at their mean. Then you compute the model’s prediction at each grid point. The result is a set of “adjusted” group means that remove the influence of confounding covariates.

# Simulate data where treatment groups have different age distributions
n_per_group = 50
n = 3 * n_per_group

# Group assignments
group = np.repeat([0, 1, 2], n_per_group)

# Age differs by group (confound!)
age = np.where(group == 0, np.random.normal(30, 5, n),
       np.where(group == 1, np.random.normal(40, 5, n),
                np.random.normal(50, 5, n)))
age_centered = age - age.mean()

# True model: y = group_effect + 0.5*age + noise
true_effects = np.array([10, 12, 11])
y = true_effects[group] + 0.5 * age_centered + np.random.normal(0, 2, n)

# Raw group means (confounded by age)
for g in range(3):
    mask = group == g
    print(f"Group {g}: raw mean = {y[mask].mean():.2f}, mean age = {age[mask].mean():.1f}")

Group 0: raw mean = 4.41, mean age = 28.9
Group 1: raw mean = 11.96, mean age = 40.8
Group 2: raw mean = 16.04, mean age = 49.8

EMMs as linear predictions¶

Building the reference grid¶

The reference grid is a small design matrix $\mathbf{X}_{\text{ref}}$ where each row represents one “typical” observation for a group of interest. For our 3-group example, we need 3 rows: one per treatment group, with age held at the overall mean (i.e., age_centered = 0).

# Design matrix (intercept, group dummies, age)
X = np.column_stack([
    np.ones(n),
    (group == 1).astype(float),
    (group == 2).astype(float),
    age_centered,
])

# Fit via OLS
beta = np.linalg.lstsq(X, y, rcond=None)[0]
print(f"Coefficients: {beta}")

# Reference grid: each group at mean age (age_centered = 0)
X_ref = np.array([
    [1, 0, 0, 0],  # Group 0, mean age
    [1, 1, 0, 0],  # Group 1, mean age
    [1, 0, 1, 0],  # Group 2, mean age
])

Coefficients: [9.1461 2.4006 2.5733 0.4334]

The formal statement¶

Theorem EMM.1: EMMs = $\mathbf{X}_{\text{ref}} \hat{\boldsymbol{\beta}}$

The EMMs are just matrix multiplication: each reference grid row dotted with the coefficient vector gives the model’s prediction for that “typical” observation.

emms = X_ref @ beta
for g in range(3):
    raw_mean = y[group == g].mean()
    print(f"Group {g}: EMM = {emms[g]:.2f}  (raw mean = {raw_mean:.2f})")

Group 0: EMM = 9.15  (raw mean = 4.41)
Group 1: EMM = 11.55  (raw mean = 11.96)
Group 2: EMM = 11.72  (raw mean = 16.04)

Variance propagation¶

The problem¶

EMMs are linear combinations of $\hat{\boldsymbol{\beta}}$ , so they inherit the uncertainty of $\hat{\boldsymbol{\beta}}$ . How do we compute standard errors for EMMs?

Building intuition¶

The variance of a linear transformation $\mathbf{A}\mathbf{x}$ is:

\text{Var}(\mathbf{A}\mathbf{x}) = \mathbf{A} \, \text{Var}(\mathbf{x}) \, \mathbf{A}^\top

(1)

Applied to EMMs: $\text{Var}(\mathbf{X}_{\text{ref}} \hat{\boldsymbol{\beta}}) = \mathbf{X}_{\text{ref}} \, \mathbf{V} \, \mathbf{X}_{\text{ref}}^\top$ , where $\mathbf{V}$ is the variance-covariance matrix of $\hat{\boldsymbol{\beta}}$ .

# Variance-covariance of beta
residuals = y - X @ beta
sigma2 = np.sum(residuals**2) / (n - X.shape[1])
vcov = sigma2 * np.linalg.inv(X.T @ X)

# Variance of EMMs
vcov_emm = X_ref @ vcov @ X_ref.T
se_emm = np.sqrt(np.diag(vcov_emm))

print("EMM variance-covariance:")
print(vcov_emm)
print(f"\nStandard errors: {se_emm}")

EMM variance-covariance:
[[ 0.1846 -0.0098 -0.1015]
 [-0.0098  0.0742  0.0089]
 [-0.1015  0.0089  0.1659]]

Standard errors: [0.4297 0.2724 0.4074]

The formal statement¶

Theorem EMM.2: $\text{Var}(\mathbf{X}_{\text{ref}} \hat{\boldsymbol{\beta}}) = \mathbf{X}_{\text{ref}} \, \mathbf{V} \, \mathbf{X}_{\text{ref}}^\top$

From EMMs to contrasts¶

Pairwise comparisons¶

Once you have EMMs, you typically want to compare them. A contrast is a linear combination of EMMs that answers a specific question, like “does Group 1 differ from Group 0?” Contrasts are defined by a contrast matrix $\mathbf{L}$ applied to $\hat{\boldsymbol{\beta}}$ , and their standard errors follow from the same variance propagation formula.

# Contrast: Group 1 - Group 0
L = np.array([[0, 1, 0, 0]])  # picks out the Group 1 dummy
diff_10 = L @ beta
se_diff = np.sqrt(L @ vcov @ L.T)[0, 0]
print(f"Group 1 - Group 0: {diff_10[0]:.2f} +/- {se_diff:.2f}")

# All pairwise: (1-0), (2-0), (2-1)
L_all = np.array([
    [0, 1, 0, 0],   # Group 1 - Group 0
    [0, 0, 1, 0],   # Group 2 - Group 0
    [0, -1, 1, 0],  # Group 2 - Group 1
])
diffs = L_all @ beta
se_diffs = np.sqrt(np.diag(L_all @ vcov @ L_all.T))
for i, label in enumerate(["1-0", "2-0", "2-1"]):
    print(f"  {label}: {diffs[i]:.2f} +/- {se_diffs[i]:.2f}")

Group 1 - Group 0: 2.40 +/- 0.53
  1-0: 2.40 +/- 0.53
  2-0: 2.57 +/- 0.74
  2-1: 0.17 +/- 0.47

Connection to inference¶

Note that the Wald statistic from Inference can test whether these contrasts are significant. The variance propagation formula EMM.2 feeds directly into the Wald formula INF.4: once you have a contrast estimate and its standard error, you form $t = \text{estimate} / \text{SE}$ and compare against the $t$ -distribution.

Summary¶

Theorem	Statement	What it means
EMM.1	EMMs = $\mathbf{X}_{\text{ref}} \hat{\boldsymbol{\beta}}$	Marginal means as linear predictions
EMM.2	$\text{Var}(\text{EMMs}) = \mathbf{X}_{\text{ref}} \, \mathbf{V} \, \mathbf{X}_{\text{ref}}^\top$	Uncertainty propagation for EMMs