Inference, estimation, and decision-making from data
Sometimes the most important variable is one you never observe. Which cluster did this point come from? Which topic generated this document? These hidden latent variables Z make maximum likelihood hard: you can't just maximize the log-likelihood because it now contains a sum inside a log. Expectation–Maximization (EM) is the elegant fix.
EM breaks a hard joint optimization into two easy alternating steps, repeated until convergence:
The quantity EM actually pushes up each round is a lower bound on the log-likelihood called the ELBO (evidence lower bound). The E-step tightens the bound; the M-step raises it.