Random Variables

The mathematics of uncertainty

Outcomes like "heads" or "the third red card" are awkward to do arithmetic with. A random variable fixes that: it's a rule that attaches a number to every outcome. Formally X: Ω → ℝ. Flip three coins and let X count the heads. Now each outcome maps to 0, 1, 2, or 3, and we can average, square, and sum.

A carnival wheel lands on colored wedges, and each color pays a different amount: a number stuck onto every outcome. That number is a random variable X, the cash you win on a spin. Listing how often each payout comes up, p(x) = P(X = x), tells you the whole spread of your prize.

For a discrete random variable, the probability mass function p(x) = P(X = x) lists the probability of each value. It must be non-negative and sum to 1 across the support, which is just the axioms re-expressed on numbers.

Where this lives in MLA label Y is a random variable, and so is a model's prediction. The argmax of a softmax, the predicted class, is a random variable that maps the model's output distribution to a single index. Sampling from a language model is drawing a random variable (the next token) from its PMF over the vocabulary.
▶ Random Variables
← IndependenceExpectation →