Multivariate calculus from first principles
Strip backpropagation down to its math and you find this module. The multivariate chain rule tells you how to differentiate a composition of functions, which is the one thing an autograd engine actually does. We start with the scalar version: how a change in one input ripples through intermediate variables to the output.
Suppose z depends on intermediates y₁, y₂, …, which in turn depend on inputs x. To find how z changes with one input, sum over every path from that input to the output, multiplying derivatives along each path:
Each term (∂z/∂yₖ)(∂yₖ/∂xᵢ) is one route's contribution; you add up all the routes. If there's only one path, it collapses to the familiar 1-D chain rule.