Gaussian Distribution

The mathematics of uncertainty

The Gaussian (normal) distribution shows up more than any other in machine learning. It's the smooth, symmetric bell you get whenever many small independent effects add up. Two numbers fix it completely: the mean μ (where the peak sits) and the variance σ² (how wide the bell is).

The formula has fewer moving parts than it looks. The heart is exp(−(x−μ)²/2σ²): distance from the mean, squared, made negative, so the density falls off fast as you move away from μ. The clutter out front is just the constant that makes the area equal 1.

Drag μ to slide the bell left/right and σ to widen or sharpen it. A small σ gives a tall, confident spike; a large σ spreads belief thinly over a wide range.

Where this lives in MLThe first time a network touches a Gaussian is before training even starts: weight initialization draws from a normal scaled by layer size (He/Xavier init). Noise models assume Gaussian residuals, which makes least-squares regression the maximum-likelihood fit. A VAE's latent space is a Gaussian prior, and the reparameterization trick samples z = μ + σ·ε with ε ~ N(0,1), which is the z-score run…
▶ Gaussian Distribution
← Expectation & Variance (continuous)Key Continuous Distributions →