Conditional Probability

The mathematics of uncertainty

New information changes the odds. Once you learn that "the die came up even," the chance it's a 2 is no longer 1/6, because you've ruled out the odd faces. Conditional probability is the machinery for updating a probability when you know some event B has already occurred.

Read P(A | B) as "the probability of A given B." Geometrically it's a zoom-and-renormalize: throw away everything outside B, treat B as the new whole world, and ask what fraction of that world is also in A. Dividing by P(B) rescales so the shrunken world still has total probability 1.

Imagine a screening test that just came back positive. That clue does not change reality, but it narrows the possibilities: you can throw away everyone whose test was negative and look only at the positive group B. The question "do I actually have the disease?" becomes P(A | B), the fraction of that narrowed-down group who are truly sick.

Where this lives in MLA classifier computes a conditional probability. Its whole job is P(class | input), the probability of each label given the pixels or tokens it sees. The softmax vector is literally P(y | x). Conditioning on the input is what turns a prior over classes into a prediction.
▶ Conditional Probability
← Probability AxiomsBayes' Theorem →