The mathematics of uncertainty
So far each random variable lived alone. But the interesting questions are about relationships: height and weight, an image and its label. A joint distribution p(x, y) gives the probability of every pair of values at once. It's the complete description of how two (or more) variables behave together.
For discrete variables, picture a grid: rows are values of X, columns values of Y, and each cell holds the probability of that combination. All the cells are non-negative and sum to 1, the axioms again, now in two dimensions. For continuous variables it's a density f(x, y) and probabilities are volumes under a 2-D surface.
Imagine a two-way table of people sorted by height and weight at the same time: short-and-light in one cell, tall-and-heavy in another, and a number in every cell saying how common that pairing is. That whole grid of pairings is the joint distribution p(x, y) — it describes height and weight together, not one at a time. Fill in every cell, make them non-negative and add to 1, and you have captured the complete picture of how the two traits travel together.