Probability Axioms

The mathematics of uncertainty

How do you assign a number to "how likely"? Andrey Kolmogorov showed that the entire theory rests on just three rules. Every other formula you'll use is a consequence of these.

In words: probabilities are never negative; the probability that something happens is exactly 1; and for events that can't overlap, probabilities simply add. That's it. A probability is a way of splitting the total mass 1 across the outcomes.

Picture a whole pie cut into slices, one slice per outcome. No slice can have negative size (that is the rule P(A) ≥ 0), and all the slices together must fill the entire pie, never more and never less, which is exactly P(Ω) = 1. Asking for the probability of an event just means adding up the slices that belong to it.

Where this lives in MLA softmax layer turns raw scores into a probability distribution that obeys these axioms by construction: each output is non-negative (axiom 1) and they sum to 1 across classes (axiom 2). When a model reports "P(cat) = 0.7", the remaining 0.3 is split across all other classes, which is the complement rule in action. Any time you renormalize scores into probabilities, you are enforcing…
▶ Probability Axioms
← Sample Spaces & EventsConditional Probability →