Lagrange multipliers and optimisation Intuition

Suppose you want to find the maximum or minimum of a function $ f(x, y) $ but you’re not allowed to explore all of $ \mathbb{R}^2 $ — instead, you’re restricted to points $ (x, y) $ that satisfy a constraint $ g(x, y) = c. $

A concrete example:

$ f(x, y) = x^2 y $, a bumpy surface, nonlinear in both variables.
$ g(x, y) = x^2 + y^2 = 1 $, the unit circle.

We’re looking for the point(s) (x, y) on the unit circle where $ f(x, y) $ is as big or small as possible.

Contour Intuition

The contour lines of the function $ f $ are curves where $ f $ takes on a constant value. If we were to plot the values of $ f $ on the $ z $ axis, then we can visualise the function as a surface in a 3d plot. We can think of the contours as lines running along the surface of $ f $ itself (at constant heights), or we can plot the contours in the $ x, y $ plane (projecting them down so to speak). For our purposes here, it will be more useful to think of plotting the contours in the $ x, y $ plane. The figure below has both.

The constraint $ g(x, y) = 1 $ is a simple circle, centered at the origin. Now, imagine walking along the circle and watching how the value of $ f $ changes (this walk would be the red line in the chart above).

Claim: At the highest and lowest points (relative to $ f $), the red circle will just kiss a contour of $ f $; these are the blue points. In other words (remembering that the circle is just a contour of $ g$, we see that the contours of $ f $ and $ g $ will be tangent to one another. That’s the crucial key geometric insight behind the Lagrange multiplier method:

Sketch proof/argument: Suppose (for contradiction) that at a stationary point, the contours are not tangent; ie that the path along the constraint (the red line) crosses a contour of $ f $. Then, moving slightly forward or back along the red line would result in a larger or smaller value of $ f $ (contradicting us being at a stationary point).

Gradients and the Lagrange Condition

Recall the gradient $ \nabla f(x, y) $ of a function $ f(x,y) $ is a vector:

$$
\left(
\begin{array}{c}
\frac{\partial f}{\partial x} \\
\frac{\partial f}{\partial y}
\end{array}
\right)
$$

It is worth remembering that this vector lives in the $ x,y $ plane, not in the 3d space which includes the z axis.

Often, in visualisations, the gradient is shown to point up or down along the surface itself. This is a helpful way to think of the gradient, but it can be misleading given the strict definition means it lives in the input space ($ x,y $ plane) alone.

Recall also that at any point, the gradient vector of a funciton $ f $ is perpendicular to the contours of $ f $; with the vector and contours both seen to lie in the input space (in our example, in the $ x,y $ plane) – see this post for the intuition of why the gradient is perpendicular to the contours

Hence, contours of $ f $ and $ g $ being tangent, is the same as saying the gradients are parallel:

$$\nabla f(x, y) = \lambda \nabla g(x, y), \text{for some scalar } \lambda $$

This gives us two equations, one for the gradients, and one for the constraint:

\begin{cases} \nabla f(x, y) = \lambda \nabla g(x, y) \\ g(x, y) = c \end{cases}

In our example:

$$f(x, y) = x^2 y \Rightarrow \nabla f = (2x y, x^2) \\
g(x, y) = x^2 + y^2 \Rightarrow \nabla g = (2x, 2y) $$

So we solve:

\begin{aligned} 2x y &= \lambda \cdot 2x \\ x^2 &= \lambda \cdot 2y \\ x^2 + y^2 &= 1 \end{aligned}

This system gives candidate max/min points on the unit circle.

Edge Case: What if $ \nabla f = 0 $?

The logic above depends on $ \nabla f \neq 0 $. If $ \nabla f = 0 $, then there’s no “direction” in which $ f $ increases or decreases — we’re at a stationary point of $ f $ itself.

But in general, it’s an important caveat: when $ \nabla f = 0 $, the Lagrangian condition $ \nabla f = \lambda \nabla g $ is vacuously satisfied and can’t help us locate optima. You need to check whether such a point lies on the constraint set and handle it separately.

Finally, the Lagrangian

We can repackage the two equation system:

\begin{cases} \nabla f(x, y) = \lambda \nabla g(x, y) \\ g(x, y) = c \end{cases}

using the Lagrangian function. We define the Lagrangian:

$$L(x,y,\lambda)=f(x,y)−\lambda(g(x,y)−c)$$

and then we note that the two equation system is equivalent to

$$\nabla L(x, y, \lambda) = 0 $$

This works because the Lagrangian $ L(x,y, \lambda) $ is a function of $ x, y$ and $ \boldsymbol{\lambda} $. The derivative wrt $ x, y$ recovers the first equation $\nabla f(x, y) = \lambda \nabla g(x, y)$ while the derivative wrt $\lambda$ recovers the constraint equation.

Special case still to think about

The function $ f(x,y = x^2 + y ^2 $ has a global min at 0. If the constraint is $ g(x,y) = x+y = 0 $, then at the minimum, the contour of f is just a dot.

Another version is if $ f(x,y) = x^2 $, so we have a prabola $ z = x^2 $which is swept along the y axis. In this case, all along the y axis (ie line $ x=0 $), the value of $ f $ is minimum. If we have the same constraint $ g(x,y) = x+y = 0 $, then the minimum is at the origin. However if we compute the gradients, we would get

$$f(x, y) = x^2 \Rightarrow \nabla f = (2x, 0) \\
g(x,y) = x + y \Rightarrow \nabla g = (1 , 1) $$

which are never equal (ie the contours are never tangent). I think it might bet that the Lagrangian method only works at a “proper” stationary point (ie a unique point; in this example the fact that we have an entire line along which the function is minimum might be what breaks it).

Youtube video from which I took inspiration:

Lagrange multipliers and optimisation Intuition

Contour Intuition

Gradients and the Lagrange Condition

Edge Case: What if \( \nabla f = 0 \)?

Finally, the Lagrangian

Special case still to think about

Leave a Comment Cancel Reply

Sign up to receive email updates, fresh news and more!

Contour Intuition

Gradients and the Lagrange Condition

Edge Case: What if \( \nabla f = 0 \)?

Finally, the Lagrangian

Special case still to think about

Related Posts

Leave a Comment Cancel Reply