Recall that if is a subset of -dimensional space and is a point of we say that is a point in the interior of or a point inside if there is some (small) positive number such that every point of -dimensional space within distance of is a point of .
Recall that a function of variables is differentiable at a point inside its domain if it admits first order approximation by a linear function near the given point.
Proof. Let be a differentiably parameterized curve contained in and passing through when . Since is contained in the domain of , the function is defined for all values of for which is defined, and since is differentiable at , the function is differentiable at . In fact, the “chain rule” tells us that Since has an extreme value relative to the set at the point and each is in , it follows that , a function of one variable, has a local extreme value at , and, therefore, that . Consequently, is perpendicular to the tangent vector of the curve at .
Proof. If is a point inside then every sufficiently short line segment passing through must be perpendicular to , which means that every vector must be perpendicular to .
Proof. The statement is formally true, but probably useless if . We assume that is not the zero vector. In this case is perpendicular to the tangent hyperplane (i.e., plane if or line if ) to at . Every unit vector in the tangent hyperplane is tangent to some small differentiably parameterized curve segment lying in and passing through . Hence, by the theorem, is also perpendicular to each such curve segment, and, hence, to the tangent hyperplane. Since a hyperplane has only one parallel class of normal vectors, and must be parallel.
Remark. The theorem is useful also in the case where is a function of variables and the constraint set is a curve in space. Then the fact that lies in corresponds roughly to two equations for and the orthogonality condition of the theorem provides, in non-degenerate situations an additional equation with the result that (usually) only finitely many such are possible. (Among these are points that are maxima, minima, and those that are neither.) This is equivalent to the principle of “Lagrange multipliers” discussed in the text.