Extreme Values of Functions
of Several Variables

Calculus III Handout

March 17, 2009

Recall that if S is a subset of n-dimensional space and P is a point of S we say that P is a point in the interior of S or a point inside S if there is some (small) positive number r such that every point of n-dimensional space within distance r of P is a point of S.

Recall that a function f of n variables is differentiable at a point inside its domain if it admits first order approximation by a linear function near the given point.

Theorem.   If a function f of n variables has an extreme value for the subset S of its domain at a point P of S that is a point inside the domain of f where f is differentiable, then the gradient vector fP of f at P must be perpendicular to the tangent vector at P of every differentiably parameterized curve lying in S and passing through P.

Proof. Let Gt be a differentiably parameterized curve contained in S and passing through P when t=a. Since S is contained in the domain of f, the function ht=fGt is defined for all values of t for which Gt is defined, and since f is differentiable at P=Ga, the function h is differentiable at a. In fact, the “chain rule” tells us that ha=fP·Ga. Since f has an extreme value relative to the set S at the point P and each Gt is in S, it follows that h, a function of one variable, has a local extreme value at t=a, and, therefore, that ha=0. Consequently, fP is perpendicular to the tangent vector Ga of the curve at P.

Corollary 1.   If a function f of n variables has an extreme value for the subset S of its domain at a point P of S that is a point inside S where f is differentiable, then the gradient vector fP must be the zero vector.

Proof. If P is a point inside S then every sufficiently short line segment passing through P must be perpendicular to fP, which means that every vector must be perpendicular to fP.

Corollary 2.   If a function f of n variables has an extreme value for the subset S=g=0 of its domain at a point P of S where f and g are differentiable functions, then the gradient fP of f and the gradient gP of g must be parallel vectors.

Proof. The statement is formally true, but probably useless if gP=0. We assume that gP is not the zero vector. In this case g is perpendicular to the tangent hyperplane (i.e., plane if n=3 or line if n=2) to S at P. Every unit vector in the tangent hyperplane is tangent to some small differentiably parameterized curve segment lying in S and passing through P. Hence, by the theorem, fP is also perpendicular to each such curve segment, and, hence, to the tangent hyperplane. Since a hyperplane has only one parallel class of normal vectors, fP and gP must be parallel.

Remark. The theorem is useful also in the case where f is a function of 3 variables and the constraint set S is a curve in space. Then the fact that P lies in S corresponds roughly to two equations for P and the orthogonality condition of the theorem provides, in non-degenerate situations an additional equation with the result that (usually) only finitely many such P are possible. (Among these are points that are maxima, minima, and those that are neither.) This is equivalent to the principle of “Lagrange multipliers” discussed in the text.