Distributions of Data

Inference, estimation, and decision-making from data

A single center and a single spread are just two numbers. The full shape of the data (its distribution) carries far more. The fastest way to see it is a histogram: chop the range into bins and count how many values fall in each. A smoothed version is a density plot.

Once you can see the shape, two questions matter: is it symmetric or skewed, and are its tails heavy or light?

Skewness measures asymmetry. A right-skewed (positive) distribution has a long tail stretching to the right: incomes, wait times, file sizes. A left-skewed one trails to the left. For a right-skewed shape the mean sits to the right of the median, dragged out by the tail.

Where this lives in MLDistribution shape drives real ML decisions. Activation distributions inside a network can drift and develop heavy tails, which is why batch/layer normalization exists. Loss distributions across batches reveal whether your model fails uniformly or chokes on a heavy-tailed minority of hard examples. And heavy tails are why robust losses (Huber) and gradient clipping are standard practice.
▶ Distributions of Data
← Measures of SpreadRelationships Between Variables →