Introduction#
Eigen-analysis provides structure for symmetric/PSD matrices (covariances, Laplacians) and general matrices (Markov chains, Jacobians). Power iteration is a simple iterative method to approximate the largest eigenvalue/eigenvector using repeated multiplication and normalization.
Important ideas#
Eigenpairs $(\lambda, v)$ satisfy $Av = \lambda v$; for symmetric $A$, eigenvectors form an orthonormal basis.
Spectral theorem (symmetric): $A = Q \Lambda Q^\top$ with real eigenvalues; $Q$ orthogonal, $\Lambda$ diagonal.
Eigengap and convergence: Power iteration converges at rate $|\lambda_2/\lambda_1|^k$ to the dominant eigenvector when $|\lambda_1|>|\lambda_2|$.
Rayleigh quotient: $\rho(x) = \dfrac{x^\top A x}{x^\top x}$; maximized at $\lambda_\max$, minimized at $\lambda_\min$ for symmetric $A$.
PSD matrices: Eigenvalues nonnegative; covariances and Gram matrices are PSD.
Gershgorin disks: Eigenvalues lie within unions of disks defined by row sums—gives quick bounds.
Perron–Frobenius: Nonnegative irreducible matrices have a unique positive dominant eigenvalue/vector (Markov chains, PageRank).
Relevance to ML#
PCA: Leading eigenvectors of covariance capture maximal variance; truncation yields best low-rank approximation.
Spectral clustering: Laplacian eigenvectors reveal cluster structure via graph cuts.
PageRank/Markov chains: Stationary distribution is dominant eigenvector.
Stability: Jacobian eigenvalues inform exploding/vanishing dynamics.
Attention/covariance spectra: Eigenvalue spread relates to conditioning and numerical stability.
Algorithmic development (milestones)#
1900s: Spectral theorem for symmetric matrices.
1911: Gershgorin circle theorem (eigenvalue bounds).
1912–1930s: Power methods and refinements (von Mises, Householder); 1929 Perron–Frobenius theory for nonnegative matrices.
1998: PageRank (Brin–Page) uses power iteration at web scale.
2000s: Spectral clustering widely adopted (Ng–Jordan–Weiss 2002).
2010s: Randomized eigensolvers/svd for large-scale ML.
Definitions#
Eigenpair: $(\lambda, v\neq 0)$ with $Av=\lambda v$.
Spectrum $\sigma(A)$: set of eigenvalues; spectral radius $\rho(A)=\max |\lambda_i|$.
Rayleigh quotient: $\rho(x)=\dfrac{x^\top A x}{x^\top x}$ for $x\neq0$.
Power iteration: $x_{k+1}=\dfrac{A x_k}{\lVert A x_k\rVert}$; converges to dominant eigenvector under eigengap.
Laplacian: $L=D-W$ (unnormalized), $L_{\text{sym}}=I-D^{-1/2} W D^{-1/2}$ for graph spectral methods.
Comments