Key Discrete Distributions

The mathematics of uncertainty

A handful of named distributions cover most discrete situations in ML. Each is a ready-made PMF with known mean and variance, so you reach for the right one instead of re-deriving from scratch.

Bernoulli(p) models one trial with two outcomes: success (1) with probability p, failure (0) with probability 1−p. It's the building block every other discrete distribution is made from.

Two everyday counts show off the headline distributions. Flip a coin 10 times and tally the heads: that count is Binomial, a sum of 10 independent yes/no trials. Now count the phone calls a help desk receives in one hour: that count is Poisson, the law for rare events sprinkled across time, with a single rate λ that doubles as both its mean and its variance.

Where this lives in MLWhen you pick a classification loss, you are really picking one of these distributions. Binary cross-entropy is the negative log-likelihood of a Bernoulli: it scores a model's single probability against a 0/1 label. Multi-class cross-entropy is the negative log-likelihood of a Categorical, the softmax output scored against a one-hot label. The loss you choose encodes which distribution you assume…
▶ Key Discrete Distributions
← VariancePDF & CDF →