The mathematics of uncertainty
Given a joint p(x, y), suppose you only care about X and want to forget Y. You marginalize: sum (or integrate) the joint over all values of the unwanted variable. What's left is the marginal distribution of X alone.
The name comes from old probability tables: you'd add up each row and write the total in the margin. Those row-sums are the marginal of one variable, and the column-sums are the marginal of the other. Marginalizing means "integrate out the variable you don't want."
Take that two-way height–weight table and suppose you only care about height, ignoring weight entirely. You simply add up each row of the joint p(x, y) and jot the total in the margin — that row-total is how often each height occurs no matter the weight. Reading only those margin totals gives the marginal distribution of X, the one variable seen on its own.