Survivorship Bias: The Sample You're Not Seeing
Morgan Voss·
The stray cats in your neighborhood look healthy, resourceful, and resilient. They navigate traffic, find food, and manage weather that would concern a pampered indoor cat. It is easy to conclude, watching them, that outdoor cats do well.
What you are not seeing is every outdoor cat that did not survive the process of becoming a stray you could observe. They are not there to be counted. The sample you have is not outdoor cats in general. It is outdoor cats who survived long enough to be noticed. That is a different population, with different characteristics, and it supports different conclusions.
This is survivorship bias. It occurs whenever the process of entering your sample is correlated with the outcome you are trying to study.
The WWII Planes
The most precise historical example comes from the Statistical Research Group, a classified group of American mathematicians consulting for the military during World War II. The problem was armor: where should engineers reinforce aircraft to reduce losses from enemy fire?
The military had data. Returning planes showed bullet holes concentrated in the fuselage and wings. The initial instinct was to reinforce those areas. Abraham Wald pointed out that the analysis was looking at the wrong planes.
The planes in the data were the ones that returned. Their bullet holes showed where a plane could take damage and still fly back. The planes missing from the data (those shot down) were not available for inspection. The areas with few bullet holes on returning planes were not areas that avoided being hit. They were areas where a hit was likely fatal.
Wald recommended reinforcing the areas that showed the least damage on the returning aircraft. The reasoning was that those were the areas where damage was not compatible with survival, and therefore not represented in the surviving sample.
The analysis required explicitly reasoning about what was absent and why.
The Formal Structure
Survivorship bias is a form of selection bias. The general problem: the sample is not drawn uniformly from the population of interest. Instead, there is a filter, and entry into the sample depends on passing through it. When the filter is correlated with the variable being measured, the sample systematically misrepresents the population.
Let be the event of surviving the filter (being observed). Let be the outcome of interest. The concern is that:
The expected value of conditional on surviving the filter differs from the unconditional expected value. Studies that report only and interpret it as are reporting a biased estimate of the quantity that actually matters.
The degree of bias depends on how correlated the filter is with and what fraction of the population the filter removes. A filter that removes a small random subset introduces little bias. A filter that removes predominantly observations with extreme values of introduces substantial bias in exactly the direction you would least expect.
Investment Funds
Mutual fund performance data is a practical example that affects real decisions. A fund company manages 50 funds. After ten years, 20 of them have been closed or merged into other funds, typically because their performance was poor. The remaining 30 funds are the ones available for prospective investors to evaluate.
The historical return data shown for those 30 funds looks good. Funds that underperformed were removed from the sample. The average performance of surviving funds is not the average performance of the company's funds generally. It is the average performance of the funds that did not fail.
This effect has been measured in the mutual fund industry. Studies that include only surviving funds overestimate long-run average returns by several percentage points annually relative to studies that account for all funds, including those that closed. The bias is large enough to matter for any realistic planning horizon.
Startup Advice
The same mechanism operates wherever success stories are more visible than failure stories. Successful founders write memoirs and give conference talks. Failed founders generally do not. The advice that circulates in startup culture is filtered by the same process that created the advisors: survival.
This does not mean the advice is wrong. Some of the practices that correlate with success may genuinely cause success. But the advice cannot be evaluated from the set of successful founders alone, because founders who followed the same advice and failed are not in the sample. The correlation between advice-following and success, measured only among those who succeeded, is not evidence of a causal relationship.
To establish that a given practice improves the probability of success, you would need data on outcomes across all founders who followed it, including those who did not make it far enough to be interviewed.
Testing for It
Asking whether survivorship bias is present requires asking what the full population looked like before the filter. This is often difficult, sometimes impossible, and always worth attempting.
Useful questions: Who is missing from this dataset? What determines whether an observation appears? Is the reason for absence correlated with the outcome?
Sometimes the missing data can be recovered. Financial databases increasingly include delisted funds. Medical registries often track dropout patterns. Clinical trial protocols require pre-registration to make the set of conducted trials visible even when results are not published.
When recovery is not possible, the appropriate response is to characterize the potential bias, estimate its direction, and be honest about what the surviving sample can and cannot tell you. A conclusion drawn from survivors is a conclusion about survivors. Whether it generalizes to the population depends on what the filter removed.
The stray cats you can see are the ones the world kept. Knowing that does not tell you very much about the ones it didn't.
The cats don't charge. The site doesn't either. If something here helped a concept click, a small tip is appreciated.
Buy the Cats a TreatNo PayPal account needed.
