Product & Quotient Rules

Single-variable calculus from first principles

When two functions are multiplied together, you can't just multiply their derivatives. That's a tempting shortcut, and a wrong one. The right rule accounts for the fact that both factors are changing at once.

Picture a rectangle whose width is f and height is g; its area is f·g. If both sides grow a little, the area grows on two fronts: a strip from the wider width, plus a strip from the taller height. That's why the answer has two terms, not one.

Picture a rectangular garden whose width and height are both being extended at once. The new area isn't just one strip, you gain a strip along the longer width and a strip along the taller height. That's why the product rule has two terms: when two changing quantities multiply, each one's growth contributes its own slice to the total.

Where this lives in MLThese rules are the building blocks autograd composes. A normalised score like a softmax probability or an attention weight is a quotient (something over a sum), and differentiating it uses the quotient rule under the hood. Batch-norm's scaling, layer-norm's division by a standard deviation: wherever a network divides one learned quantity by another, the quotient rule is what the gradient engine…
▶ Product & Quotient Rules
← Basic Derivative RulesChain Rule →