Riemann Integration

Single-variable calculus from first principles

The integral answers the companion question to the derivative: not "how fast is this changing?" but "how much has accumulated?" Geometrically, the definite integral is the area trapped between a curve and the x-axis.

Picture tracing a pond's outline onto graph paper and wanting its area. You can't multiply one width by one height, because the shore curves. So you count the little squares that fall under the outline: more squares, finer the grid, the closer your count creeps to the true area. A Riemann sum is exactly that count, and the integral is the number it settles on as the squares shrink to nothing.

For a rectangle, area is just width × height. But a curve has a wavy top — no single height to multiply by. Bernhard Riemann's idea: slice the region into thin vertical rectangles, each short enough that the curve is nearly flat across it, add up their areas, then use thinner and thinner slices.

Where this lives in MLIn probability, expectation is an integral. The average value of a quantity over a continuous distribution is E[f(X)] = ∫ f(x) p(x) dx. Entropy is −∫ p(x) ln p(x) dx; the normalizing constant of a distribution is an integral; KL divergence is an integral. Continuous probability simply is integration. And when a model "averages over a distribution" it can't integrate exactly, it does the next best…
▶ Riemann Integration
← Putting It TogetherFundamental Theorem of Calculus →