Central Limit Theorem

The mathematics of uncertainty

The law of large numbers says the sample mean converges to μ. But how does it get there, and what does the leftover wobble look like? The central limit theorem gives a striking answer: the wobble is always Gaussian, no matter what distribution you started from.

Average enough independent samples and the standardized average follows a standard normal, even if the originals were coin flips, dice, or some lopsided distribution. This is why the bell curve shows up so often: anything that's a sum of many small independent effects ends up Gaussian.

The figure averages n rolls of a flat die and histograms the result over many trials. At n = 1 the histogram is flat (uniform); crank n up and a bell emerges from nowhere, the CLT building a Gaussian out of a non-Gaussian source.

Where this lives in MLThe CLT explains the noise structure of stochastic optimization. A mini-batch gradient is an average over batch examples, so by the CLT its error around the true gradient is approximately Gaussian with spread σ/√(batch size). That's why gradient noise looks normal, why larger batches give proportionally smoother (but only √n-better) steps, and why error bars on benchmark accuracies are computed…
▶ Central Limit Theorem
← Law of Large NumbersMeasures of Center →