The Derivative

Single-variable calculus from first principles

The derivative answers one question: how fast is a function changing at a single instant? Geometrically, that is the slope of the curve right at one point, the slope of the tangent line that just kisses the curve there.

Think of the speedometer in a moving car. Your average speed over an hour is total distance divided by total time, but the needle shows something sharper: exactly how fast you're going at this very instant. The derivative is that needle, the rate of change frozen at a single moment rather than smeared across an interval.

But here's the puzzle. Slope needs two points: rise over run. A single point gives you nowhere to measure from. So how can a lone point have a slope at all? The trick is to sneak up on it.

Where this lives in MLThe gradient that trains every neural network is exactly this derivative, applied to the loss. The quantity ∂L/∂w is the slope of the loss as you nudge one weight w: its sign tells you which direction reduces the loss, and its magnitude tells you how sensitive the loss is to that weight. Training is just: evaluate this limit (an autograd engine does it for you, exactly — no shrinking h needed),…
▶ The Derivative
← ContinuityDifferentiability →