Entropy

The mathematics of uncertainty

Entropy measures uncertainty: how surprised you expect to be by a random outcome. A fair coin is maximally uncertain; a two-headed coin holds no surprise at all. Claude Shannon turned this into a number, the expected surprise, where a rare event's surprise is −log p(x) (rarer means more surprising).

Using log₂ measures entropy in bits, the average number of yes/no questions needed to pin down the outcome. Entropy is largest when the distribution is uniform (every outcome equally likely, maximum confusion) and zero when one outcome is certain (no surprise possible).

The figure shows the entropy of a single biased coin, H(p) = −p log₂ p − (1−p) log₂(1−p). Drag p: entropy peaks at p = 0.5 (1 full bit, a genuine coin flip) and drops to 0 at the certain ends.

Where this lives in MLEntropy is the parent of nearly every classification loss. It sets the floor for lossless compression and anchors cross-entropy (next lesson), the standard training loss. In RL and exploration, an entropy bonus is added to the objective to keep a policy from collapsing too soon: maximizing entropy means "stay uncertain, keep exploring." Decision trees split on whichever feature reduces entropy…
▶ Entropy
← Covariance & CorrelationCross-Entropy →