Expectation

The mathematics of uncertainty

The expectation of a random variable is its long-run average: the value you'd converge to if you repeated the experiment forever and averaged the results. It's a weighted average of the possible values, each weighted by how likely it is:

Think of the PMF as a set of weights placed along a ruler; E[X] is the balance point. It need not be a value X can actually take. A fair die averages to 3.5, which no face shows.

Picture a slot machine you feed thousands of times. On any single pull you might win big or lose your coin, but the machine has a fixed long-run average payout per play, and that number is E[X]. It is the steady value your average creeps toward as the plays pile up, even though no single spin ever lands exactly on it.

Where this lives in MLTraining minimizes an expected loss E_D[L(θ)], the average loss over the data distribution. We can't compute that expectation exactly, so we approximate it by an average over a finite sample (the training set), and over a mini-batch for each gradient step. Linearity of expectation is why the average gradient over a batch is an unbiased estimate of the true gradient.
▶ Expectation
← Random VariablesVariance →