Inference, estimation, and decision-making from data
Run one test at α = 0.05 and you have a 5% chance of a false positive. Run twenty independent tests and, even if nothing is real, you'll probably get at least one "significant" result by pure luck. This is the multiple testing problem, and it silently corrupts a huge amount of research and ML experimentation.
The chance of at least one false positive across m tests, the family-wise error rate, balloons: with m independent tests at level α it's 1 − (1 − α)m. For m = 20, α = 0.05, that's about 64%, more likely than not to find a phantom effect.
Buy a single lottery ticket and your odds of winning are tiny. Buy a thousand and one of them might "win" something purely by chance, even though you have no special insight at all. Running many statistical tests is the same gamble: with enough tries, a meaningless fluke will eventually cross the significance line and masquerade as a real discovery.