Conditional Distributions

The mathematics of uncertainty

Conditional distributions are conditional probability, lifted to whole random variables. Given that X = x, how is Y distributed? You take the joint and renormalize by the marginal of the thing you've fixed:

It's the same zoom-and-renormalize move from Lesson 3: fix X = x (pick one row of the joint table), then rescale that row so its probabilities sum to 1. The result is a genuine distribution over Y, one for each value of x.

Go back to the height–weight table, but now look at one single row — say, only the tall people — and ignore everyone else. That row's numbers don't add to 1 on their own, so you rescale them until they do, and what you get is how weight is distributed given that height is tall. That is a conditional distribution: fix X = x to one category, then renormalize that slice into a proper distribution over Y.

Where this lives in MLA discriminative model is a conditional distribution: p(y | x) is precisely what a classifier or regressor learns, the label distribution given the input. A decoder in a VAE or diffusion model is a conditional p(x | z), the data distribution given a latent code. Conditioning is how generative models steer output: text-to-image is sampling from p(image | prompt).
▶ Conditional Distributions
← Marginal DistributionsCovariance & Correlation →