Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Estimated Marginal Means

UCSD Psychology

After fitting a model, you often want to ask “what’s the predicted outcome for a typical observation in group A vs group B?” When your data has multiple predictors, the raw group means are muddied by differences in covariates across groups. Estimated marginal means (EMMs) answer this by computing predictions at meaningful reference points -- averaging over covariates to give you clean group comparisons that account for the model’s structure.

This page shows how EMMs are computed and how their uncertainty is quantified. The core idea is simple: construct a small matrix of “typical” predictor values (the reference grid), multiply by the fitted coefficients, and propagate uncertainty through the same linear algebra. By the end you’ll see that EMMs and their standard errors are direct applications of the OLS and inference machinery from the earlier pages.


What are marginal means?

The problem

You’ve fit a model with multiple predictors: y ~ treatment + age + gender. The raw group means of y are confounded by age and gender. If treatment group A happens to be older on average, their higher outcomes might reflect age rather than treatment. You want the treatment effect holding other variables at their means.

Building intuition

EMMs construct a reference grid -- specific predictor values at which to evaluate the model. For categorical variables, you cycle through all levels. For continuous variables, you typically hold them at their mean. Then you compute the model’s prediction at each grid point. The result is a set of “adjusted” group means that remove the influence of confounding covariates.

# Simulate data where treatment groups have different age distributions
n_per_group = 50
n = 3 * n_per_group

# Group assignments
group = np.repeat([0, 1, 2], n_per_group)

# Age differs by group (confound!)
age = np.where(group == 0, np.random.normal(30, 5, n),
       np.where(group == 1, np.random.normal(40, 5, n),
                np.random.normal(50, 5, n)))
age_centered = age - age.mean()

# True model: y = group_effect + 0.5*age + noise
true_effects = np.array([10, 12, 11])
y = true_effects[group] + 0.5 * age_centered + np.random.normal(0, 2, n)

# Raw group means (confounded by age)
for g in range(3):
    mask = group == g
    print(f"Group {g}: raw mean = {y[mask].mean():.2f}, mean age = {age[mask].mean():.1f}")
Group 0: raw mean = 4.41, mean age = 28.9
Group 1: raw mean = 11.96, mean age = 40.8
Group 2: raw mean = 16.04, mean age = 49.8

EMMs as linear predictions

Building the reference grid

The reference grid is a small design matrix Xref\mathbf{X}_{\text{ref}} where each row represents one “typical” observation for a group of interest. For our 3-group example, we need 3 rows: one per treatment group, with age held at the overall mean (i.e., age_centered = 0).

# Design matrix (intercept, group dummies, age)
X = np.column_stack([
    np.ones(n),
    (group == 1).astype(float),
    (group == 2).astype(float),
    age_centered,
])

# Fit via OLS
beta = np.linalg.lstsq(X, y, rcond=None)[0]
print(f"Coefficients: {beta}")

# Reference grid: each group at mean age (age_centered = 0)
X_ref = np.array([
    [1, 0, 0, 0],  # Group 0, mean age
    [1, 1, 0, 0],  # Group 1, mean age
    [1, 0, 1, 0],  # Group 2, mean age
])
Coefficients: [9.1461 2.4006 2.5733 0.4334]

The formal statement

Theorem EMM.1: EMMs = Xrefβ^\mathbf{X}_{\text{ref}} \hat{\boldsymbol{\beta}}

The EMMs are just matrix multiplication: each reference grid row dotted with the coefficient vector gives the model’s prediction for that “typical” observation.

emms = X_ref @ beta
for g in range(3):
    raw_mean = y[group == g].mean()
    print(f"Group {g}: EMM = {emms[g]:.2f}  (raw mean = {raw_mean:.2f})")
Group 0: EMM = 9.15  (raw mean = 4.41)
Group 1: EMM = 11.55  (raw mean = 11.96)
Group 2: EMM = 11.72  (raw mean = 16.04)
Why does this work?

Xref\mathbf{X}_{\text{ref}} encodes the covariate values at each reference grid point. The EMM at point ii is simply the linear predictor Xref[i]β^\mathbf{X}_{\text{ref}}[i] \cdot \hat{\boldsymbol{\beta}}. By holding covariates at their means, we remove confounding and get “adjusted” group means. Notice how the EMMs are closer to the true effects (10, 12, 11) than the raw means, which were distorted by the age confound.


Variance propagation

The problem

EMMs are linear combinations of β^\hat{\boldsymbol{\beta}}, so they inherit the uncertainty of β^\hat{\boldsymbol{\beta}}. How do we compute standard errors for EMMs?

Building intuition

The variance of a linear transformation Ax\mathbf{A}\mathbf{x} is:

Var(Ax)=AVar(x)A\text{Var}(\mathbf{A}\mathbf{x}) = \mathbf{A} \, \text{Var}(\mathbf{x}) \, \mathbf{A}^\top

Applied to EMMs: Var(Xrefβ^)=XrefVXref\text{Var}(\mathbf{X}_{\text{ref}} \hat{\boldsymbol{\beta}}) = \mathbf{X}_{\text{ref}} \, \mathbf{V} \, \mathbf{X}_{\text{ref}}^\top, where V\mathbf{V} is the variance-covariance matrix of β^\hat{\boldsymbol{\beta}}.

# Variance-covariance of beta
residuals = y - X @ beta
sigma2 = np.sum(residuals**2) / (n - X.shape[1])
vcov = sigma2 * np.linalg.inv(X.T @ X)

# Variance of EMMs
vcov_emm = X_ref @ vcov @ X_ref.T
se_emm = np.sqrt(np.diag(vcov_emm))

print("EMM variance-covariance:")
print(vcov_emm)
print(f"\nStandard errors: {se_emm}")
EMM variance-covariance:
[[ 0.1846 -0.0098 -0.1015]
 [-0.0098  0.0742  0.0089]
 [-0.1015  0.0089  0.1659]]

Standard errors: [0.4297 0.2724 0.4074]

The formal statement

Theorem EMM.2: Var(Xrefβ^)=XrefVXref\text{Var}(\mathbf{X}_{\text{ref}} \hat{\boldsymbol{\beta}}) = \mathbf{X}_{\text{ref}} \, \mathbf{V} \, \mathbf{X}_{\text{ref}}^\top

Why the sandwich form?

This is the standard variance formula for linear combinations: Var(Ax)=AVar(x)A\text{Var}(\mathbf{A}\mathbf{x}) = \mathbf{A} \, \text{Var}(\mathbf{x}) \, \mathbf{A}^\top. Here A=Xref\mathbf{A} = \mathbf{X}_{\text{ref}} and x=β^\mathbf{x} = \hat{\boldsymbol{\beta}}. The off-diagonal elements of XrefVXref\mathbf{X}_{\text{ref}} \, \mathbf{V} \, \mathbf{X}_{\text{ref}}^\top tell you how the EMMs co-vary -- important for pairwise comparisons, where the variance of a difference depends on the covariance between the two quantities being subtracted.


From EMMs to contrasts

Pairwise comparisons

Once you have EMMs, you typically want to compare them. A contrast is a linear combination of EMMs that answers a specific question, like “does Group 1 differ from Group 0?” Contrasts are defined by a contrast matrix L\mathbf{L} applied to β^\hat{\boldsymbol{\beta}}, and their standard errors follow from the same variance propagation formula.

# Contrast: Group 1 - Group 0
L = np.array([[0, 1, 0, 0]])  # picks out the Group 1 dummy
diff_10 = L @ beta
se_diff = np.sqrt(L @ vcov @ L.T)[0, 0]
print(f"Group 1 - Group 0: {diff_10[0]:.2f} +/- {se_diff:.2f}")

# All pairwise: (1-0), (2-0), (2-1)
L_all = np.array([
    [0, 1, 0, 0],   # Group 1 - Group 0
    [0, 0, 1, 0],   # Group 2 - Group 0
    [0, -1, 1, 0],  # Group 2 - Group 1
])
diffs = L_all @ beta
se_diffs = np.sqrt(np.diag(L_all @ vcov @ L_all.T))
for i, label in enumerate(["1-0", "2-0", "2-1"]):
    print(f"  {label}: {diffs[i]:.2f} +/- {se_diffs[i]:.2f}")
Group 1 - Group 0: 2.40 +/- 0.53
  1-0: 2.40 +/- 0.53
  2-0: 2.57 +/- 0.74
  2-1: 0.17 +/- 0.47

Connection to inference

Note that the Wald statistic from Inference can test whether these contrasts are significant. The variance propagation formula EMM.2 feeds directly into the Wald formula INF.4: once you have a contrast estimate and its standard error, you form t=estimate/SEt = \text{estimate} / \text{SE} and compare against the tt-distribution.


Summary

TheoremStatementWhat it means
EMM.1EMMs = Xrefβ^\mathbf{X}_{\text{ref}} \hat{\boldsymbol{\beta}}Marginal means as linear predictions
EMM.2Var(EMMs)=XrefVXref\text{Var}(\text{EMMs}) = \mathbf{X}_{\text{ref}} \, \mathbf{V} \, \mathbf{X}_{\text{ref}}^\topUncertainty propagation for EMMs