Measures of Center

Inference, estimation, and decision-making from data

Before you model data, you have to summarize it honestly. The most basic summary is a single number that answers "where is the data centered?" There are three classic answers, and they don't always agree, which is exactly why you need to know all three.

The mean is the balance point: add every value, divide by how many there are. The median is the middle value once you sort them. The mode is simply the most common value.

Picture the asking prices on one short street, in hundreds of thousands: 3, 4, 4, 5, 30. Four ordinary homes and one waterfront mansion. The mean price is 46/5 = 9.2, yet not a single ordinary house costs anywhere near that. The median, the middle value once sorted, is just 4 and reports the typical home honestly, because the lone mansion can't drag the middle of the list very far.

Where this lives in MLEvery loss metric you report is a measure of center over the test set. "Mean squared error" averages the squared errors; the mean is sensitive, so a few catastrophic predictions dominate it. Report the median error too when you suspect a heavy tail. It tells you what a typical example experiences, not what the worst few do to the average.
▶ Measures of Center
← Central Limit TheoremMeasures of Spread →