MLE for Common Distributions

Inference, estimation, and decision-making from data

The MLE recipe is always the same: write the log-likelihood, take its derivative with respect to the parameter, set it to zero, solve. For the two distributions you'll meet most, the answer is beautifully simple: it's just a sample average.

For data drawn from a normal distribution, maximizing the log-likelihood gives the most intuitive estimators possible:

Imagine you flip a bent coin a bunch of times to guess how biased it is. Maximum likelihood doesn't agonize over it: the single best guess for the chance of heads is just the fraction of heads you actually saw. The estimate p̂ is nothing more than the running tally turned into an average, the same plain sample mean x̄ in disguise.

Where this lives in MLThese closed forms are why the simplest models are so fast to fit. Linear regression is MLE under Gaussian noise and has a one-shot closed-form solution. Logistic regression is MLE for a Bernoulli/categorical label, with no closed form, but the same principle drives the gradient steps. The recipe "log-likelihood → derivative → zero" is the skeleton of every fitting procedure.
▶ MLE for Common Distributions
← Maximum Likelihood EstimationBayesian Estimation →