# for QR codes use inline
# %matplotlib inline
# qr_setting = 'url'
# qrviz_setting = 'show'
#
# for lecture use notebook
%matplotlib inline
qr_setting = None
#
%config InlineBackend.figure_format='retina'
# import libraries
import numpy as np
import matplotlib as mp
import pandas as pd
import matplotlib.pyplot as plt
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
import pandas as pd
from importlib import reload
from datetime import datetime
from IPython.display import Image
from IPython.display import display_html
from IPython.display import display
from IPython.display import Math
from IPython.display import Latex
from IPython.display import HTML;
# Animation Notes
#
from matplotlib import animation
#
# two methods for animation - see:
# http://louistiao.me/posts/notebooks/embedding-matplotlib-animations-in-jupyter-as-interactive-javascript-widgets/
# 1 - directly inline via %matplotlib notebook - advantage: real-time, disadvantage: no manual control
# 2 - encode as video, embedded into notebook (base64) and display via notebook player or JS widget
#     downside - not real time, need to generate complete video before it can be displayed
# 2a - HTML(anim.to_html5_video())  [notebook player, bare bones] [uses html5 <video> tag]
# 2b - HTML(anim.to_jshtml())  [JS widget, extra controls]
# these latter can be made default representation via next line, which can also be 'html5'
mp.rcParams['animation.html'] = 'jshtml'
# 3 - save as an animated GIF and display using 
# rot_animation.save('rotation.gif', dpi=80, writer='imagemagick')
# ![RotatingImage](rotation.gif "Quadratic Form")
# see also https://stackoverflow.com/questions/35532498/animation-in-ipython-notebook/46878531#46878531

Symmetric Matrices

fig = ut.three_d_figure((24, 5), 
                        'Intersection of the positive definite quadratic form z = 3 x1^2 + 7 x2 ^2 with the constraint ||x|| = 1', 
                        -2, 2, -2, 2, 0, 8, 
                        equalAxes = False, figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 7.]])
for angle in np.linspace(0, 2*np.pi, 200):
    x = np.array([np.cos(angle), np.sin(angle)])
    z = x.T @ qf @ x
    fig.plotPoint(x[0], x[1], z, 'b')
    fig.plotPoint(x[0], x[1], 0, 'g')
fig.plotQF(qf, alpha=0.5)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
# do not call fig.save here
_images/L24SymmetricMatrices_3_0.png

Today we’ll study a very important class of matrices: symmetric matrices.

We’ll see that symmetric matrices have properties that relate to both eigendecomposition, and orthogonality.

Furthermore, symmetric matrices open up a broad class of problems we haven’t yet touched on: constrained optimization.

As a result, symmetric matrices arise very often in applications.

Definition. A symmetric matrix is a matrix \(\,A\) such that \(\;A^T = A\).

Clearly, such a matrix is square.

Furthermore, the entries that are not on the diagonal come in pairs, on opposite sides of the diagonal.

Example. Here are three symmetric matrices:

\[\begin{split}\begin{bmatrix}1&0\\0&-3\end{bmatrix},\;\;\;\;\begin{bmatrix}0&-1&0\\-1&5&8\\0&8&-7\end{bmatrix},\;\;\;\;\begin{bmatrix}a&b&c\\b&d&e\\c&e&f\end{bmatrix}\end{split}\]

Here are three nonsymmetric matrices:

\[\begin{split}\begin{bmatrix}1&-3\\3&0\end{bmatrix},\;\;\;\;\begin{bmatrix}0&-4&0\\-6&1&-4\\0&-6&1\end{bmatrix},\;\;\;\;\begin{bmatrix}5&4&3&2\\4&3&2&1\\3&2&1&0\end{bmatrix}\end{split}\]

Orthogonal Diagonalization

At first glance, a symmetric matrix may not seem that special!

But in fact symmetric matrices have a number of interesting properties.

First, we’ll look at a remarkable fact:

the eigenvectors of a symmetric matrix are orthogonal

Example. Diagonalize the following symmetric matrix:

\[\begin{split}A = \begin{bmatrix}6&-2&-1\\-2&6&-1\\-1&-1&5\end{bmatrix}\end{split}\]

Solution.

The characteristic equation of \(A\) is

\[0 = -\lambda^3 + 17\lambda^2 -90\lambda + 144 \]
\[ = -(\lambda-8)(\lambda-6)(\lambda-3)\]

So the eigenvalues are 8, 6, and 3.

We construct a basis for each eigenspace (using our standard method of finding the nullspace of \(A-\lambda I\)):

\[\begin{split}\lambda_1 = 8: \mathbf{v}_1 = \begin{bmatrix}-1\\1\\0\end{bmatrix};\;\;\;\;\lambda_2=6: \mathbf{v}_2 = \begin{bmatrix}-1\\-1\\2\end{bmatrix};\;\;\;\;\;\lambda_3=3: \mathbf{v}_3=\begin{bmatrix}1\\1\\1\end{bmatrix}\end{split}\]

These three vectors form a basis for \(\mathbb{R}^3.\)

But here is the remarkable thing: these three vectors form an orthogonal set.

That is, any two eigenvectors are orthogonal.

For example,

\[\mathbf{v}_1^T\mathbf{v}_2 = (-1)(-1) + (1)(-1) + (0)(2) = 0\]

As a result, \(\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3\}\) form an orthogonal basis for \(\mathbb{R}^3.\)

Let’s normalize these vectors so they each have length 1:

\[\begin{split}\mathbf{u}_1 = \begin{bmatrix}-1/\sqrt{2}\\1/\sqrt{2}\\0\end{bmatrix};\;\;\;\;\mathbf{u}_2 = \begin{bmatrix}-1/\sqrt{6}\\-1/\sqrt{6}\\2/\sqrt{6}\end{bmatrix};\;\;\;\;\; \mathbf{u}_3=\begin{bmatrix}1/\sqrt{3}\\1/\sqrt{3}\\1/\sqrt{3}\end{bmatrix}\end{split}\]

The orthogonality of the eigenvectors of a symmetric matrix is quite important.

To see this, let’s write the diagonalization of \(A\) in terms of these eigenvectors and eigenvalues:

\[\begin{split}P = \begin{bmatrix}-1/\sqrt{2}&-1/\sqrt{6}&1/\sqrt{3}\\1/\sqrt{2}&-1/\sqrt{6}&1/\sqrt{3}\\0&2/\sqrt{6}&1/\sqrt{3}\end{bmatrix},\;\;\;\;D = \begin{bmatrix}8&0&0\\0&6&0\\0&0&3\end{bmatrix}.\end{split}\]

Then, \(A = PDP^{-1},\) as usual.

But, here is the interesting thing:

\(P\) is square and has orthonormal columns.

So \(P\) is an orthogonal matrix.

So, that means that \(P^{-1} = P^T.\)

So, \(A = PDP^T.\)

The Spectral Theorem

Here is a theorem that shows that we always get orthogonal eigenvectors when we diagonalize a symmetric matrix:

Theorem. If \(A\) is symmetric, then any two eigenvectors of \(A\) from different eigenspaces are orthogonal.

Proof.

Let \(\mathbf{v}_1\) and \(\mathbf{v}_2\) be eigenvectors that correspond to distinct eigenvalues, say, \(\lambda_1\) and \(\lambda_2.\)

To show that \(\mathbf{v}_1^T\mathbf{v}_2 = 0,\) compute

\[\lambda_1\mathbf{v}_1^T\mathbf{v}_2 = (\lambda_1\mathbf{v}_1)^T\mathbf{v}_2\]
\[=(A\mathbf{v}_1)^T\mathbf{v}_2\]
\[=(\mathbf{v}_1^TA^T)\mathbf{v}_2\]
\[=\mathbf{v}_1^T(A\mathbf{v}_2)\]
\[=\mathbf{v}_1^T(\lambda_2\mathbf{v}_2)\]
\[=\lambda_2\mathbf{v}_1^T\mathbf{v}_2\]

So we conclude that \(\lambda_1(\mathbf{v}_1^T\mathbf{v}_2) = \lambda_2(\mathbf{v}_1^T\mathbf{v}_2).\)

But \(\lambda_1 \neq \lambda_2,\) so this can only happen if \(\mathbf{v}_1^T\mathbf{v}_2 = 0.\)

So \(\mathbf{v}_1\) is orthogonal to \(\mathbf{v}_2.\)

The same argument applies to any other pair of eigenvectors with distinct eigenvalues.

So any two eigenvectors of \(A\) from different eigenspaces are orthogonal.

We can now introduce a special kind of diagonalizability:

An \(n\times n\) matrix is said to be orthogonally diagonalizable if there are an orthogonal matrix \(P\) (with \(P^{-1} = P^T\)) and a diagonal matrix \(D\) such that

\[A = PDP^T = PDP^{-1}\]

Such a diagonalization requires \(n\) linearly independent and orthonormal eigenvectors.

When is this possible?

If \(A\) is orthogonally diagonalizable, then

\[A^T = (PDP^T)^T = (P^T)^TD^TP^T = PDP^T = A\]

So \(A\) is symmetric!

That is, whenever \(A\) is orthogonally diagonalizable, it is symmetric.

It turns out the converse is true (though we won’t prove it).

Hence we obtain the following theorem:

Theorem. An \(n\times n\) matrix is orthogonally diagonalizable if and only if it is a symmetric matrix.

Remember that when we studied diagonalization, we found that it was a difficult process to determine if an arbitrary matrix was diagonalizable.

But here, we have a very nice rule: every symmetric matrix is (orthogonally) diagonalizable.

Quadratic Forms

An important role that symmetric matrices play has to do with quadratic expressions.

Up until now, we have mainly focused on linear equations – equations in which the \(x_i\) terms occur only to the first power.

Actually, though, we have looked at some quadratic expressions when we considered least-squares problems.

For example, we looked at expressions such as \(\Vert x\Vert^2\) which is \(\sum x_i^2.\)

We’ll now look at quadratic expressions generally. We’ll see that there is a natural and useful connection to symmetric matrices.

Definition. A quadratic form is a function of variables, eg, \(x_1, x_2, \dots, x_n,\) in which every term has degree two.

Examples:

\(4x_1^2 + 2x_1x_2 + 3x_2^2\) is a quadratic form.

\(4x_1^2 + 2x_1\) is not a quadratic form.

Quadratic forms arise in many settings, including signal processing, physics, economics, and statistics.

In computer science, quadratic forms arise in optimization and graph theory, among other areas.

Essentially, what an expression like \(x^2\) is to a scalar, a quadratic form is to a vector.

Fact. Every quadratic form can be expressed as \(\mathbf{x}^TA\mathbf{x}\), where \(A\) is a symmetric matrix.

There is a simple way to go from a quadratic form to a symmetric matrix, and vice versa.

To see this, let’s look at some examples.

Example. Let \(\mathbf{x} = \begin{bmatrix}x_1\\x_2\end{bmatrix}.\) Compute \(\mathbf{x}^TA\mathbf{x}\) for the matrix \(A = \begin{bmatrix}4&0\\0&3\end{bmatrix}.\)

Solution.

\[\begin{split}\mathbf{x}^TA\mathbf{x} = \begin{bmatrix}x_1&x_2\end{bmatrix}\begin{bmatrix}4&0\\0&3\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}\end{split}\]
\[\begin{split}= \begin{bmatrix}x_1&x_2\end{bmatrix}\begin{bmatrix}4x_1\\3x_2\end{bmatrix}\end{split}\]
\[= 4x_1^2 + 3x_2^2.\]

Example. Compute \(\mathbf{x}^TA\mathbf{x}\) for the matrix \(A = \begin{bmatrix}3&-2\\-2&7\end{bmatrix}.\)

Solution.

\[\begin{split}\mathbf{x}^TA\mathbf{x} = \begin{bmatrix}x_1&x_2\end{bmatrix}\begin{bmatrix}3&-2\\-2&7\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}\end{split}\]
\[=x_1(3x_1 - 2x_2) + x_2(-2x_1 + 7x_2)\]
\[=3x_1^2-2x_1x_2-2x_2x_1+7x_2^2\]
\[=3x_1^2-4x_1x_2+7x_2^2\]

Notice the simple structure: $\(a_{11} \mbox{ is multiplied by } x_1 x_1\)\( \)\(a_{12} \mbox{ is multiplied by } x_1 x_2\)\( \)\(a_{21} \mbox{ is multiplied by } x_2 x_1\)\( \)\(a_{22} \mbox{ is multiplied by } x_2 x_2\)$

We can write this concisely:

\[ \mathbf{x}^TA\mathbf{x} = \sum_{ij} a_{ij}x_i x_j \]

Example. For \(\mathbf{x}\) in \(\mathbb{R}^3\), let

\[Q(\mathbf{x}) = 5x_1^2 + 3x_2^2 + 2x_3^2 - x_1x_2 + 8x_2x_3.\]

Write this quadratic form \(Q(\mathbf{x})\) as \(\mathbf{x}^TA\mathbf{x}\).

Solution.

The coefficients of \(x_1^2, x_2^2, x_3^2\) go on the diagonal of \(A\).

Based on the previous example, we can see that the coefficient of each cross term \(x_ix_j\) is the sum of two values in symmetric positions on opposite sides of the diagonal of \(A\).

So to make \(A\) symmetric, the coefficient of \(x_ix_j\) for \(i\neq j\) must be split evenly between the \((i,j)\)- and \((j,i)\)-entries of \(A\).

You can check that

\[\begin{split}Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x} = \begin{bmatrix}x_1&x_2&x_3\end{bmatrix}\begin{bmatrix}5&-1/2&0\\-1/2&3&4\\0&4&2\end{bmatrix}\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}\end{split}\]

Question Time! Q24.1

Classifying Quadratic Forms

Notice that \(\mathbf{x}^TA\mathbf{x}\) is a scalar.

In other words, when \(A\) is an \(n\times n\) matrix, the quadratic form \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\) is a real-valued function with domain \(\mathbb{R}^n\).

Here are four quadratic forms with domain \(\mathbb{R}^2\).

Notice that except at \(\mathbf{x}=\mathbf{0},\) the values of \(Q(\mathbf{x})\) are all positive in the first case, and all negative in the last case.

fig = ut.three_d_figure((24, 1), 'z = 3 x1^2 + 7 x2 ^2',
                        -2, 2, -2, 2, -1, 3, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2 + 7x_2^2$', 'z = 3x_1^2 + 7x_2^2', size = 18)
fig.save();
_images/L24SymmetricMatrices_67_0.png
fig = ut.three_d_figure((24, 2), 'z = 3 x1^2', 
                        -2, 2, -2, 2, -1, 3, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 0.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2$', 'z = 3x_1^2', size = 18)
fig.save();
_images/L24SymmetricMatrices_68_0.png
fig.rotate()
fig = ut.three_d_figure((24, 3), 'z = 3 x1^2 - 7 x2 ^2', 
                        -2, 2, -2, 2, -1, 3, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., -7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2 - 7x_2^2$', 'z = 3x_1^2 - 7x_2^2', size = 18)
fig.save();
_images/L24SymmetricMatrices_70_0.png
fig.rotate()
fig = ut.three_d_figure((24, 4), 'z = -3 x1^2 - 7 x2 ^2', 
                        -2, 2, -2, 2, -3, 1, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[-3., 0.],[0., -7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = -3x_1^2 - 7x_2^2$', 'z = -3x_1^2 - 7x_2^2', size = 18)
fig.save();
_images/L24SymmetricMatrices_72_0.png

The differences between these surfaces is important for problems such as optimization.

In an optimization problem, one seeks the minimum or maximum value of a function (perhaps over a subset of its domain).

Definition. A quadratic form \(Q\) is:

  1. positive definite if \(Q(\mathbf{x}) > 0\) for all \(\mathbf{x} \neq 0.\)

  2. positive semidefinite if \(Q(\mathbf{x}) \geq 0\) for all \(\mathbf{x} \neq 0.\)

  3. indefinite if \(Q(\mathbf{x})\) assumes both positive and negative values.

  4. negative definite if \(Q(\mathbf{x}) < 0\) for all \(\mathbf{x} \neq 0.\)

Classifying Quadratic Forms

Knowing which kind of quadratic form one is dealing with is important for optimization.

Consider these two quadratic forms:

\[\begin{split} P(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}, \mbox{ where } A = \begin{bmatrix}3&2\\2&1\end{bmatrix} \end{split}\]
\[\begin{split} Q(\mathbf{x}) = \mathbf{x}^TB\mathbf{x}, \mbox{ where } B = \begin{bmatrix}3&2\\2&3\end{bmatrix} \end{split}\]

Notice that that matrices differ in only one position.

Let’s say we would like to find

  • the minimum value of \(P(\mathbf{x})\) over all \(\mathbf{x}\), or

  • the minimum value of \(Q(\mathbf{x})\) over all \(\mathbf{x}\)

In each case, we need to ask: is it possible? Does a minimum value even exist?

fig = ut.three_d_figure((24, 5), 'P(x)', 
                        -5, 5, -5, 5, -5, 5, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 2],[2, 1]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = \mathbf{x}^TA\mathbf{x}$', 'z = x^TAx', size = 18)
fig.ax.view_init(azim = 30)
fig.save();
_images/L24SymmetricMatrices_79_0.png
fig = ut.three_d_figure((24, 6), 'Q(x)', 
                        -5, 5, -5, 5, -5, 5, 
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 2],[2, 3]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = \mathbf{x}^TB\mathbf{x}$', 'z = x^TBx', size = 18)
fig.save();
_images/L24SymmetricMatrices_80_0.png

Clearly, we cannot minimize \(\mathbf{x}^TA\mathbf{x}\) over all \(\mathbf{x}\),

… but we can minimize \(\mathbf{x}^TB\mathbf{x}\) over all \(\mathbf{x}\).

How can we distinguish between these two cases in general?

That is, in high dimension, without the ability to visualize?

There is a remarkably simple way to determine, for any quadratic form, which class it falls into.

Theorem. Let \(A\) be an \(n\times n\) symmetric matrix.

Then a quadratic form \(\mathbf{x}^TA\mathbf{x}\) is

  1. positive definite if and only if the eigenvalues of \(A\) are all positive.

  2. positive semidefinite if and only if the eigenvalues of \(A\) are all nonnegative.

  3. indefinite if and only if \(A\) has both positive and negative eigenvalues.

  4. negative definite if and only if the eigenvalues of \(A\) are all negative.

Proof.

A proof sketch for the postive definite case.

Let’s consider \(\mathbf{u}_i\), an eigenvector of \(A\). Then

\[\mathbf{u}_i^TA\mathbf{u}_i = \lambda_i\mathbf{u}_i^T\mathbf{u}_i.\]

If all eigenvalues are positive, then all such terms are positive.

Since \(A\) is symmetric, it is diagonalizable and so its eigenvectors span \(\mathbb{R}^n\).

So any \(\mathbf{x}\) can be expressed as a weighted sum of \(A\)’s eigenvectors.

Writing out the expansion of \(\mathbf{x}^TA\mathbf{x}\) in terms of \(A\)’s eigenvectors, we get only positive terms.

Example. Let’s look at the four quadratic forms above. Their associated matrices are

\[\begin{split}(a)\; \begin{bmatrix}3&0\\0&7\end{bmatrix}\;\;\;(b)\;\begin{bmatrix}3&0\\0&0\end{bmatrix}\;\;\;(c)\;\begin{bmatrix}3&0\\0&-7\end{bmatrix}\;\;\;(d)\;\begin{bmatrix}-3&0\\0&-7\end{bmatrix}\end{split}\]

Example. Is \(Q(\mathbf{x}) = 3x_1^2 + 2x_2^2 + x_3^2 + 4x_1x_2 + 4x_2x_3\) positive definite?

Solution. Because of all the plus signs, this form “looks” positive definite. But the matrix of the form is

\[\begin{split}\begin{bmatrix}3&2&0\\2&2&2\\0&2&1\end{bmatrix}\end{split}\]

and the eigenvalues of this matrix turn out to be 5, 2, and -1.

So \(Q\) is an indefinite quadratic form.

Question Time! Q24.2

Constrained Optimization

A common kind of optimization is to find the maximum or the minimum value of a quadratic form \(Q(\mathbf{x})\) for \(\mathbf{x}\) in some specified set.

For example, a common constraint is that \(\mathbf{x}\) varies over the set of unit vectors.

This is called constrained optimization.

While it can be a difficult problem in general, for quadratic forms it has a particularly elegant solution.

The requirement that a vector \(\mathbf{x}\) in \(\mathbb{R}^n\) be a unit vector can be stated in several equivalent ways:

\[\Vert\mathbf{x}\Vert = 1,\;\;\;\;\Vert\mathbf{x}\Vert^2=1,\;\;\;\;\mathbf{x}^T\mathbf{x} = 1.\]

Here is an example problem:

\[\mbox{minimize } 3x_1^2 + 7x_2^2 \mbox{ subject to } \Vert\mathbf{x}\Vert = 1 \]

Let’s visualize this problem.

Here is the quadratic form \(z = 3x_1^2 + 7x_2^2\):

fig = ut.three_d_figure((24, 7), 'z = 3 x1^2 + 7 x2 ^2', 
                        -2, 2, -2, 2, 0, 8, 
                        equalAxes = False,
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 7.]])
fig.plotQF(qf, alpha=0.6)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2 + 7x_2^2$', 'z = 3x_1^2 + 7x_2^2', size = 18)
fig.save();
_images/L24SymmetricMatrices_98_0.png

Now the set of all vectors \(\Vert\mathbf{x}\Vert = 1\) is a circle.

We plot this circle in the \((x_1, x_2)\) plane in green,

and we plot the value that \(z\) takes on for those points in blue.

fig = ut.three_d_figure((24, 8), 
                        'Intersection of the positive definite quadratic form z = 3 x1^2 + 7 x2 ^2 with the constraint ||x|| = 1', 
                        -2, 2, -2, 2, 0, 8, 
                        equalAxes = False,
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 7.]])
for angle in np.linspace(0, 2*np.pi, 200):
    x = np.array([np.cos(angle), np.sin(angle)])
    z = x.T @ qf @ x
    fig.plotPoint(x[0], x[1], z, 'b')
    fig.plotPoint(x[0], x[1], 0, 'g')
fig.plotQF(qf, alpha=0.5)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'Intersection of quadratic form $z = 3 x_1^2 + 7 x_2^2$ and $\Vert \mathbf{x}\Vert = 1$',
                'Intersection of quadratic form z = 3 x_1^2 + 7 x_2^2 and ||x|| = 1',
                size = 18)
fig.save();
_images/L24SymmetricMatrices_100_0.png

The set of blue points is the set \(z = 3x_1^2 + 7x_2^2\) such that \(\Vert \mathbf{x}\Vert = 1\).

Now, we can see the geometric sense of a constrained optimization problem.

In particular, we can visualize the solutions to two constrained optimizations:

\[ \min 3x_1^2 + 7x_2^2 \mbox{ such that } \Vert\mathbf{x}\Vert = 1 \]

and

\[ \max 3x_1^2 + 7x_2^2 \mbox{ such that } \Vert\mathbf{x}\Vert = 1 \]
fig = ut.three_d_figure((24, 9), 
                        'Maximum and minimum of the positive definite quadratic form z = 3 x1^2 + 7 x2 ^2 subject to ||x|| = 1', 
                        -2, 2, -2, 2, 0, 8,
                        equalAxes = False,
                        figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 7.]])
for angle in np.linspace(0, 2*np.pi, 200):
    x = np.array([np.cos(angle), np.sin(angle)])
    z = x.T @ qf @ x
    fig.plotPoint(x[0], x[1], z, 'b')
#
max_pt = np.array([0., 1.])
max_pt_z = max_pt.T @ qf @ max_pt
fig.plotPoint(max_pt[0], max_pt[1], max_pt_z, 'r')
fig.text(max_pt[0]+.2, max_pt[1]+.2, max_pt_z+.2, '(0, 1, 7)', '(0, 1, 7)', 14)
min_pt = np.array([1., 0.])
#
min_pt_z = min_pt.T @ qf @ min_pt
fig.plotPoint(min_pt[0], min_pt[1], min_pt_z, 'r')
fig.text(min_pt[0]+.2, min_pt[1]+.2, min_pt_z-.8, '(1, 0, 3)', '(1, 0, 3)', 14)
#
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
#
fig.set_title(r'Max and Min of $z = 3 x_1^2 + 7 x_2^2$ s.t. $\Vert x\Vert = 1$',
                'Max and Min of z = 3 x_1^2 + 7 x_2^2 s.t. ||x|| = 1',
                size = 18)
fig.save();
_images/L24SymmetricMatrices_103_0.png

Solving Constrained Optimization Over Quadratic Forms

When a quadratic form has no cross-product terms, it is easy to find the maximum and minimum of \(Q(\mathbf{x})\) for \(\mathbf{x}^T\mathbf{x} = 1.\)

Example. Find the maximum and minimum values of \(Q(\mathbf{x}) = 9x_1^2 + 4x_2^2 + 3x_3^2\) subject to the constraint \(\mathbf{x}^T\mathbf{x} = 1.\)

Since \(x_2^2\) and \(x_3^2\) are nonnegative, we know that

\[4x_2^2 \leq 9x_2^2\;\;\;\;\mbox{and}\;\;\;\;3x_3^2\leq 9x_3^2.\]

So

\[Q(\mathbf{x}) = 9x_1^2 + 4x_2^2 + 3x_3^2\]
\[\leq 9x_1^2 + 9x_2^2 + 9x_3^2\]
\[=9(x_1^2 + x_2^2 + x_3^2)\]
\[=9\]

Whenever \(x_1^2 + x_2^2 + x_3^2 = 1.\) So the maximum value of \(Q(\mathbf{x})\) cannot exceed 9 when \(\mathbf{x}\) is a unit vector.

Furthermore, \(Q(\mathbf{x}) = 9\) when \(\mathbf{x}=(1,0,0).\)

Thus 9 is the maximum value of \(Q(\mathbf{x})\) for \(\mathbf{x}^T\mathbf{x} = 1\).

A similar argument shows that the minimum value of \(Q(\mathbf{x})\) when \(\mathbf{x}^T\mathbf{x}=1\) is 3.

Eigenvalues Solve Contrained Optimization

Observation.

Note that the matrix of the quadratic form in the example is

\[\begin{split}A = \begin{bmatrix}9&0&0\\0&4&0\\0&0&3\end{bmatrix}\end{split}\]

So the eigenvalues of \(A\) are \(9, 4,\) and \(3\).

We note that the greatest and least eigenvalues equal, respectively, the (constrained) maximum and minimum of \(Q(\mathbf{x})\).

In fact, this is true for any quadratic form.

Theorem. Let \(A\) be a symmetric matrix, and let

\[M = \max_{\mathbf{x}^T\mathbf{x} = 1} \mathbf{x}^TA\mathbf{x}.\]

Then \(M\) is the greatest eigenvalue \(\lambda_1\) of \(A\).

The value of \(Q(\mathbf{x})\) is \(\lambda_1\) when \(\mathbf{x}\) is the unit eigenvector corresponding to \(\lambda_1\).

A similar theorem holds for the constrained minimum of \(Q(\mathbf{x})\) and the least eigenvector \(\lambda_n\).

Question Time! Q24.3