Measures of Spread

Inference, estimation, and decision-making from data

A center tells you where the data sits; spread tells you how much it wiggles around that center. Two data sets can share the same mean and be wildly different: one tightly clustered, one all over the place. Spread is the difference.

The workhorse is variance: the average squared distance from the mean. Its square root, the standard deviation, lives in the same units as the data, so it's easier to interpret.

Two classes sit the same quiz and both average 72, so on paper they look identical. But class A scored 70, 72, 74 (everyone bunched together) while class B scored 50, 72, 94 (scattered wide). Same center, utterly different stories: spread is exactly the number that tells them apart.

Where this lives in MLSpread is everywhere in ML reliability. Gradient variance across a mini-batch controls how noisy each training step is; high variance means a jittery descent. And when you report a model's accuracy, the standard deviation across random seeds is what tells you whether a "+0.3%" improvement is real or just noise. A result without its spread is half a result.

▶ Measures of Spread

← Measures of Center Distributions of Data →