Gradient Descent Preview

Single-variable calculus from first principles

Suppose you want the lowest point of a curve but you can only see the ground directly under your feet — you can feel the slope, nothing more. What do you do? Simple: step in the downhill direction, then feel again, then step again. Repeat. That's gradient descent, the algorithm that trains essentially every modern AI model.

Imagine walking downhill in thick fog so dense you can't see a step ahead. You can't spot the bottom of the valley, but you can still feel with your foot which way the ground slopes down, and take a step that way. Feel, step, feel, step. Gradient descent is exactly this blind, patient shuffle toward the lowest ground.

Written as a rule that updates your position each step:

Where this lives in MLThis single line is the heart of every optimizer in deep learning. The weight update is identical in spirit: w ← w − η∇L, where ∇L is just the multi-dimensional derivative (the gradient) from the next course. SGD, Adam, RMSProp and the rest are all refinements of this skeleton — smarter step sizes, momentum, per-parameter rates — but the bones are exactly the rule above. Non-convexity is why deep…
▶ Gradient Descent Preview
← ConvexitySystematic Sketch Protocol →