Taylor Polynomials

Single-variable calculus from first principles

A Taylor polynomial approximates a complicated function near a point by a simple polynomial, one built to match the function's value, slope, curvature, and so on, right at that point. Get enough of those to agree and the polynomial hugs the curve closely nearby.

The idea is layered. A constant matches the height. Add a linear term and you match the slope too (that's the tangent line). Add a quadratic term and you match the curvature. Each new term fixes one more derivative.

Slide the number of terms in the figure and watch a low-order polynomial peel away from the curve, while a higher-order one clings to it over a wider range.

Where this lives in MLTaylor expansion is everywhere in optimisation. Gradient descent uses the linear (first-order) Taylor term, stepping along the slope. Newton's method uses the quadratic term, fitting a parabola and jumping to its minimum. The whole hierarchy of optimisers comes down to "how many Taylor terms do we keep?" And linearising a nonlinearity near its operating point is how you analyse a network's local…
▶ Taylor Polynomials
← Integration by Parts (brief)Key Taylor Series →