Chain Rule: Scalar Composition

Multivariate calculus from first principles

Strip backpropagation down to its math and you find this module. The multivariate chain rule tells you how to differentiate a composition of functions, which is the one thing an autograd engine actually does. We start with the scalar version: how a change in one input ripples through intermediate variables to the output.

Suppose z depends on intermediates y₁, y₂, …, which in turn depend on inputs x. To find how z changes with one input, sum over every path from that input to the output, multiplying derivatives along each path:

Each term (∂z/∂yₖ)(∂yₖ/∂xᵢ) is one route's contribution; you add up all the routes. If there's only one path, it collapses to the familiar 1-D chain rule.

Where this lives in MLThis sum-over-paths is exactly the backward pass through one node of a network. Each intermediate yₖ is a neuron's activation; ∂z/∂yₖ is the gradient flowing back into it; ∂yₖ/∂xᵢ is the local derivative of that operation. Multiply and sum, and you have propagated the gradient one step back. Repeat that step across the whole graph and you have trained the model.
▶ Chain Rule: Scalar Composition
← Hessian GeometryChain Rule: Matrix Form →