Chain Rule: Matrix Form

Multivariate calculus from first principles

The sum-over-paths formula is really matrix multiplication written out term by term. When functions are vector-valued, the chain rule collapses into a clean product of Jacobians, and this is the form that powers real autograd systems.

For a composition f ∘ g, the Jacobian of the whole is the Jacobian of the outer map (evaluated at the inner output) times the Jacobian of the inner map:

The shape check is what makes it click. If g: Rⁿ → Rᵏ and f: Rᵏ → Rᵐ, then J_g is k×n, J_f is m×k, and their product is m×n, exactly the shape the overall map Rⁿ → Rᵐ demands. The inner dimension k cancels, just as in ordinary matrix multiplication.

Where this lives in MLThis product is why deep networks suffer vanishing and exploding gradients. Multiply many Jacobians whose singular values sit below 1 and the product shrinks toward nothing; let them sit above 1 and it blows up. Residual connections, careful initialization, and normalization all exist to keep this Jacobian product near a healthy scale so gradients survive the trip back through many layers.
▶ Chain Rule: Matrix Form
← Chain Rule: Scalar CompositionComputation Graphs →