Regression to the Mean: Why Exceptional Performance Doesn't Last
Morgan Voss·
A cat has an exceptional hunting day and catches six mice. Her usual rate is closer to one or two. The next day she catches one. It is tempting to explain the decline. She tired herself out. The hunting ground was temporarily depleted. The sudden abundance made her complacent.
All of these explanations may be true. None of them is necessary. The more statistically complete explanation is that six was an unusual result, and results tend not to stay unusual.
Galton's Observation
Francis Galton noticed this in the 1880s while studying the heights of parents and their children. Tall parents tended to have tall children, but the children were typically less extreme in height than their parents. Short parents had short children, but again the children were closer to the population average. The offspring regressed toward the mean of the population.
Galton called this "regression toward mediocrity." The word "regression" in linear regression is borrowed from this observation: the original context was this tendency of extreme values to be followed by less extreme ones.
The mechanism is not hereditary or biological in any particular way. It is a consequence of correlation.
The Mathematical Structure
Suppose and are bivariate normal with means and , standard deviations $\sigma_X$ and , and correlation . The conditional expectation of given $X = x$ is:
When , and it almost always is for real phenomena, the predicted value of is pulled toward relative to how far is from . The more extreme the value of $X$, the more pronounced this pull.
At , variables move in perfect lockstep and there is no regression effect. At $\rho = 0$, the best prediction of is just regardless of . Every real system sits somewhere between those extremes, and so every real system exhibits regression to the mean.
For the hunting cat: if we model day-to-day catch as draws from a distribution with some true individual average, with random variation around it, then the best prediction for any cat's next-day catch is somewhere between yesterday's result and the population mean. The more exceptional yesterday was, the more it likely reflected favorable random conditions, and the less it predicts tomorrow.
Why This Gets Misread
The regression effect is invisible to anyone not looking for it, because it is always accompanied by a plausible story. The athlete who had an exceptional game is praised. The following week she performs at a more ordinary level, and the explanation offered is that the praise made her overconfident, or she relaxed after the good result. The explanation fits. It is also probably irrelevant.
This becomes consequential when evaluating interventions. A student scores very poorly on an exam and receives tutoring. On the next exam, she scores higher. The tutoring gets the credit. But students who score at the bottom of one exam tend to score higher on the next even without intervention, because their first result likely reflected a worse-than-usual performance. The intervention appears effective in part because it was applied at a point when improvement was statistically predictable.
The same pattern appears in medical contexts. Patients seek treatment when symptoms are severe, which is often when those symptoms are also at their worst. Many conditions have natural fluctuation. By the time the treatment could plausibly act, some improvement would have occurred anyway. Disentangling the treatment effect from regression to the mean requires a control group that did not receive the intervention.
How to Test for It
The key question is whether performance improves after intervention even without the intervention. If patients randomly assigned to a control group (no treatment, just watchful waiting) also improve following an extreme baseline measurement, regression to the mean is at work. The treatment effect is whatever improvement exceeds that baseline rate of natural regression.
Without a control group, there is no way to separate the regression effect from the treatment effect. Studies that track only treated individuals following an extreme baseline event are particularly vulnerable to this confound. A dramatic reversal in outcomes is exactly what the statistics predict, with or without the intervention.
The Direction Is Symmetric
It works both ways. A cat who has an unusually quiet hunting day is expected to return to her average the next day. A fund manager who has an exceptionally poor year is predicted to perform closer to the mean in the following year, as is one who had an exceptionally good year. This symmetry is useful: if your model predicts regression only in one direction, the model is probably picking up something other than the pure statistical phenomenon.
The regression effect is also not about time specifically. It applies to any two variables that are imperfectly correlated, regardless of which is "first." If you select the cats with the longest left rear paw, their right rear paws will be shorter on average. Not because the left paw caused the right paw to shorten. Because the extreme selection on one measurement imperfectly predicts the other.
What this means in practice
Regression to the mean does not mean that extraordinary performance is impossible, that interventions cannot work, or that exceptional individuals are merely lucky. It means that extreme measurements carry more noise than typical ones, and that noise does not persist.
The signal persists. A genuinely skilled hunter will have a higher average than an unskilled one, and that difference is real. But a single observation of six mice tells you something about the cat and something about the day, and separating those contributions requires more data than a single data point can provide.
When the next observation is less extreme, the correct response is usually not to construct an explanation. It is to update the estimate of the average.
The cats don't charge. The site doesn't either. If something here helped a concept click, a small tip is appreciated.
Buy the Cats a TreatNo PayPal account needed.
