Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Marginal vs Conditional Effects

UCSD Psychology

In mixed models, there are two fundamentally different ways to interpret the effects of predictors. Conditional effects describe what happens for a specific group -- a particular classroom, patient, or experimental subject. They answer: “given that I know which group this observation belongs to, what’s the expected outcome?” Marginal effects describe what happens on average across all groups, integrating over the distribution of random effects. They answer: “for a randomly chosen observation from the population, what’s the expected outcome?”

For linear mixed models, the distinction is often academic -- marginal and conditional slopes are identical because the identity link function preserves additivity. But for generalized linear mixed models (GLMMs), the two can differ dramatically. The nonlinear link function (logit, log, etc.) means that averaging predictions across groups is not the same as predicting at the average random effect. Understanding this distinction is crucial for interpreting mixed model results correctly, and for knowing what your software is actually reporting.


Linear mixed models: the easy case

Intercepts cancel in slopes (MAR.1)

In a random-intercept LMM, each group has its own intercept but the same slope. When computing the effect of a predictor (the change in yy per unit change in xx), the group intercepts cancel out. This means the conditional slope (within any specific group) and the marginal slope (averaged across groups) are the same.

n_groups = 5
n_per = 30
n = n_groups * n_per
group = np.repeat(np.arange(n_groups), n_per)

# Random intercepts, common slope
group_intercepts = np.array([60, 65, 70, 75, 80])
true_slope = 2.0
x = np.random.normal(0, 1, n)
y = group_intercepts[group] + true_slope * x + np.random.normal(0, 1, n)

# Conditional prediction: y_i = (intercept_g + slope * x)
# Marginal prediction:   E[y] = E[intercept_g] + slope * x = grand_mean + slope * x
# The SLOPE is the same either way!

for g in range(n_groups):
    mask = group == g
    # Group-specific OLS slope
    x_g = x[mask]
    y_g = y[mask]
    slope_g = np.cov(x_g, y_g)[0, 1] / np.var(x_g)
    print(f"Group {g} slope: {slope_g:.3f}")
print(f"\nTrue slope:     {true_slope:.3f}")
print("→ Marginal slope = Conditional slope (intercepts cancel)")
Group 0 slope: 2.298
Group 1 slope: 2.105
Group 2 slope: 1.782
Group 3 slope: 2.147
Group 4 slope: 1.992

True slope:     2.000
→ Marginal slope = Conditional slope (intercepts cancel)

Property MAR.1: In a random-intercept LMM, marginal and conditional slopes are identical because group intercepts cancel when computing derivatives.

Why do the intercepts cancel?

For a random intercept model, E[ygroup=g,x]=αg+βxE[y \mid \text{group}=g, x] = \alpha_g + \beta x. The slope E[y]/x=β\partial E[y] / \partial x = \beta regardless of gg. Marginalizing over groups: E[yx]=E[αg]+βxE[y \mid x] = E[\alpha_g] + \beta x, so the marginal slope is also β\beta. The intercepts shift each group’s line up or down, but the slope -- the rate of change with respect to xx -- is shared across all groups and survives marginalization.


Conditional = Fixed + Random (MAR.2)

The conditional mean for a specific group combines two components: the population-level prediction (fixed effects) and the group-specific deviation (random effects). This decomposition is what makes mixed models interpretable -- you can see both the “typical” prediction and how each group deviates from it.

Property MAR.2: The conditional mean for group gg is: E[ybg]=Xβ+ZbgE[y \mid b_g] = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}b_g

# Population prediction (fixed effects only)
grand_intercept = np.mean(group_intercepts)
y_pop = grand_intercept + true_slope * np.linspace(-2, 2, 5)

# Group-specific prediction (fixed + random)
print(f"Population intercept: {grand_intercept}")
print(f"\nConditional means = Fixed + Random:")
for g in range(n_groups):
    offset = group_intercepts[g] - grand_intercept
    print(f"  Group {g}: {grand_intercept:.0f} + {offset:+.0f} = {group_intercepts[g]:.0f}")
Population intercept: 70.0

Conditional means = Fixed + Random:
  Group 0: 70 + -10 = 60
  Group 1: 70 + -5 = 65
  Group 2: 70 + +0 = 70
  Group 3: 70 + +5 = 75
  Group 4: 70 + +10 = 80

Balanced design equivalence (MAR.3)

With balanced designs (equal group sizes), the marginal mean equals the simple average of group means. This is the simplest case: no group is over- or under-represented, so the population average is just the arithmetic mean of the group-level predictions.

Property MAR.3: With balanced designs (equal group sizes), the marginal mean equals the simple average of group means.

# Balanced design: each group has exactly n_per observations
group_means = np.array([y[group == g].mean() for g in range(n_groups)])
marginal_mean = y.mean()
avg_of_group_means = group_means.mean()

print("Group means:")
for g in range(n_groups):
    print(f"  Group {g}: {group_means[g]:.2f}")
print(f"\nMarginal mean (overall):         {marginal_mean:.2f}")
print(f"Average of group means:          {avg_of_group_means:.2f}")
print(f"Difference:                      {abs(marginal_mean - avg_of_group_means):.4f}")
print("\n→ With balanced groups, these are identical")
Group means:
  Group 0: 59.90
  Group 1: 64.82
  Group 2: 70.07
  Group 3: 74.82
  Group 4: 79.92

Marginal mean (overall):         69.91
Average of group means:          69.91
Difference:                      0.0000

→ With balanced groups, these are identical

GLMMs: where things get interesting

In GLMMs, the random effects are additive on the link scale (e.g., log-odds for logistic regression), not on the response scale (probabilities). This is a key distinction: a shift of +1 on the log-odds scale produces a different change in probability depending on where you start. Near p=0.5p = 0.5, a log-odds shift has a large effect on probability; near p=0p = 0 or p=1p = 1, the same shift has a much smaller effect.

Property MAR.4: On the link scale, ηg=Xβ+bg\eta_g = \mathbf{X}\boldsymbol{\beta} + b_g (additive). On the response scale, μg=g1(Xβ+bg)\mu_g = g^{-1}(\mathbf{X}\boldsymbol{\beta} + b_g) (nonlinear).

# Logistic GLMM: log-odds are additive
beta_fixed = np.array([-1.0, 0.5])  # intercept, slope

# 5 groups with varying intercepts (on log-odds scale)
random_intercepts = np.array([-1.5, -0.5, 0.0, 0.5, 1.5])

x_grid = np.linspace(-3, 3, 7)
print(f"{'x':>4}  {'Pop prob':>9}  " + "  ".join(f"Group {g}" for g in range(5)))
for x_val in x_grid:
    eta_pop = beta_fixed[0] + beta_fixed[1] * x_val
    p_pop = expit(eta_pop)
    group_probs = [expit(eta_pop + b) for b in random_intercepts]
    pop_str = f"{p_pop:.3f}"
    grp_str = "  ".join(f"{p:.3f}" for p in group_probs)
    print(f"{x_val:>4.0f}  {pop_str:>9}  {grp_str}")
   x   Pop prob  Group 0  Group 1  Group 2  Group 3  Group 4
  -3      0.076  0.018  0.047  0.076  0.119  0.269
  -2      0.119  0.029  0.076  0.119  0.182  0.378
  -1      0.182  0.047  0.119  0.182  0.269  0.500
   0      0.269  0.076  0.182  0.269  0.378  0.622
   1      0.378  0.119  0.269  0.378  0.500  0.731
   2      0.500  0.182  0.378  0.500  0.622  0.818
   3      0.622  0.269  0.500  0.622  0.731  0.881

Jensen’s inequality: the core issue (MAR.5)

The problem

For GLMMs, the marginal mean E[μ]=E[g1(Xβ+b)]E[\mu] = E[g^{-1}(\mathbf{X}\boldsymbol{\beta} + b)] is NOT the same as g1(Xβ+E[b])=g1(Xβ)g^{-1}(\mathbf{X}\boldsymbol{\beta} + E[b]) = g^{-1}(\mathbf{X}\boldsymbol{\beta}).

This is Jensen’s inequality: for a concave function ff, E[f(x)]f(E[x])E[f(x)] \leq f(E[x]). The logistic function is concave in certain regions, which means that averaging the transformed values is not the same as transforming the average.

Property MAR.5: Due to Jensen’s inequality, the marginal probability is NOT simply the inverse-link applied to the fixed effects alone.

# Jensen's inequality in action
# E[expit(eta + b)] ≠ expit(eta + E[b]) = expit(eta)

eta = 0.0  # population log-odds = 0 → prob = 0.5
sigma_b = 1.5  # random intercept SD

# Monte Carlo: average expit over random intercepts
b_samples = np.random.normal(0, sigma_b, 10000)
marginal_prob = np.mean(expit(eta + b_samples))
conditional_prob = expit(eta)  # at E[b] = 0

print(f"Conditional prob (at b=0):   {conditional_prob:.4f}")
print(f"Marginal prob (averaged):    {marginal_prob:.4f}")
print(f"Difference:                  {marginal_prob - conditional_prob:.4f}")
print(f"\n→ Marginal < conditional because expit is concave around 0.5")
Conditional prob (at b=0):   0.5000
Marginal prob (averaged):    0.4993
Difference:                  -0.0007

→ Marginal < conditional because expit is concave around 0.5
Why does averaging pull the probability down?

The logistic function is concave above p=0.5p = 0.5 and convex below. Around p=0.5p = 0.5, averaging the function values over random intercepts pulls the mean probability downward. Intuitively, the logistic curve flattens near 0 and 1 -- groups with extreme random effects are “squashed” toward the boundaries, while groups near the center move more freely. The net effect is that the population-average probability is less extreme than the conditional probability evaluated at the mean random effect (b=0b = 0). The larger the random effect variance σb2\sigma_b^2, the stronger this effect.


The attenuation effect (MAR.6)

The problem

Marginal slopes are always smaller (attenuated) compared to conditional slopes in GLMMs. This is a direct consequence of Jensen’s inequality applied to the derivative of the link function. Random effects add variability on the link scale, and when that variability is transformed through the nonlinear inverse-link, the resulting curves are flatter on average.

Property MAR.6: The marginal slope in a GLMM is attenuated relative to the conditional slope: βmarginal<βconditional\beta_{\text{marginal}} < \beta_{\text{conditional}}.

# Attenuation: marginal slope < conditional slope
beta_cond = 0.5  # conditional (within-group) slope

# Marginal slope via Monte Carlo
x_lo, x_hi = -0.5, 0.5  # small step
b_samples = np.random.normal(0, sigma_b, 10000)

p_lo = np.mean(expit(beta_fixed[0] + beta_cond * x_lo + b_samples))
p_hi = np.mean(expit(beta_fixed[0] + beta_cond * x_hi + b_samples))
marginal_slope_prob = (p_hi - p_lo) / (x_hi - x_lo)

# Conditional slope at b=0
p_lo_cond = expit(beta_fixed[0] + beta_cond * x_lo)
p_hi_cond = expit(beta_fixed[0] + beta_cond * x_hi)
cond_slope_prob = (p_hi_cond - p_lo_cond) / (x_hi - x_lo)

print(f"Conditional slope (prob scale): {cond_slope_prob:.4f}")
print(f"Marginal slope (prob scale):    {marginal_slope_prob:.4f}")
print(f"Attenuation ratio:              {marginal_slope_prob / cond_slope_prob:.3f}")
print(f"\n→ Marginal slope is ~{(1 - marginal_slope_prob/cond_slope_prob)*100:.0f}% smaller")
Conditional slope (prob scale): 0.0981
Marginal slope (prob scale):    0.0796
Attenuation ratio:              0.812

→ Marginal slope is ~19% smaller
Why are marginal slopes always smaller?

Random intercepts add “noise” to the link-scale prediction. When transformed through the nonlinear link function, this noise averages out to flatter curves. Think of it this way: some groups have their logistic curve shifted left, others shifted right. When you average all these shifted S-curves together, you get a shallower S-curve. More group variation (larger σb\sigma_b) means more attenuation. This is why you should always report which type of effect you’re interpreting in GLMM results -- conditional and marginal slopes can differ substantially.


The attenuation effect only occurs on the response scale. On the link scale (log-odds for logistic, log for Poisson), marginal and conditional effects are identical -- just as they are in linear models. The discrepancy is entirely introduced by the nonlinear transformation from the link scale to the response scale.

Property MAR.7: On the link scale, marginal and conditional effects are the same -- the attenuation only occurs when transforming to the response scale.

# On log-odds scale: marginal = conditional
print("Log-odds scale (LINEAR):")
print(f"  Conditional slope: {beta_cond:.4f}")
print(f"  Marginal slope:    {beta_cond:.4f}  (same!)")

print(f"\nProbability scale (NONLINEAR):")
print(f"  Conditional slope: {cond_slope_prob:.4f}")
print(f"  Marginal slope:    {marginal_slope_prob:.4f}  (attenuated!)")
Log-odds scale (LINEAR):
  Conditional slope: 0.5000
  Marginal slope:    0.5000  (same!)

Probability scale (NONLINEAR):
  Conditional slope: 0.0981
  Marginal slope:    0.0796  (attenuated!)

Practical implications

The table below summarizes when marginal and conditional effects agree and when they diverge. The key factor is the link function: identity links preserve the equivalence, while nonlinear links (logit, log) introduce attenuation.

print("When are marginal and conditional effects the same?")
print("=" * 55)
print(f"{'Model':>20}  {'Link scale':>12}  {'Response scale':>15}")
print(f"{'LMM (identity)':>20}  {'Same':>12}  {'Same':>15}")
print(f"{'GLMM (logit)':>20}  {'Same':>12}  {'Different':>15}")
print(f"{'GLMM (log)':>20}  {'Same':>12}  {'Different':>15}")
print(f"\n→ Attenuation increases with random effect variance")
print(f"→ For LMMs, this distinction doesn't matter")
print(f"→ For GLMMs, always specify which you're reporting")
When are marginal and conditional effects the same?
=======================================================
               Model    Link scale   Response scale
      LMM (identity)          Same             Same
        GLMM (logit)          Same        Different
          GLMM (log)          Same        Different

→ Attenuation increases with random effect variance
→ For LMMs, this distinction doesn't matter
→ For GLMMs, always specify which you're reporting

Summary

PropertyStatementWhen it matters
MAR.1Intercepts cancel in slopesLMM: marginal = conditional slopes
MAR.2Conditional = Fixed + RandomInterpreting group-specific predictions
MAR.3Balanced design equivalenceMarginal mean = average of group means
MAR.4Link scale additivityRandom effects additive on link scale
MAR.5Jensen’s inequalityMarginal ≠ conditional on response scale
MAR.6Attenuation effectMarginal slopes < conditional slopes
MAR.7Link scale invarianceNo attenuation on link scale

References