Applications

Single-variable calculus from first principles

Taylor's real payoff in ML is linearisation: replacing a stubborn nonlinear function with its tangent line near a point of interest. Over a small range the linear approximation is nearly exact, and linear things are far easier to analyse, compute, and reason about.

The sigmoid σ(x) = 1/(1 + e⁻ˣ) is the familiar squashing nonlinearity. Near x = 0 it passes through ½ with slope ¼, so its linear approximation is:

A flat paper street map treats the round Earth as a plane near one city. Over a few kilometres the curvature is too tiny to matter, so the flat sheet is accurate enough to navigate by, even though the planet is really a sphere. Linearisation does the same to a function: near a point it swaps the true curve for the tangent line f(x) ≈ f(0) + f′(0)·x, exact enough locally and far easier to work with.

Where this lives in MLLinearisation is a core ML reflex. The small-angle and small-input approximations simplify analysis of activations (sigmoid, GELU, softmax) near their operating point. Linearising a network around its current weights gives the neural tangent kernel view and underlies how we reason about training dynamics. And every first-order optimiser is, at heart, trusting a local linear model of the loss for…
▶ Applications
← Key Taylor SeriesVectors in Rⁿ →