Introduction#
Rank and null space describe how information flows through matrices:
Rank $r$ = number of independent columns/rows (nonzero singular values)
Null space $\text{null}(A)$ = set of inputs mapped to zero (lost information)
Column/row spaces = subspaces where outputs/inputs live; orthogonal complements relate via FTLA
Pseudoinverse solves least squares even when $A$ is rank-deficient (minimal-norm solutions)
Low-rank structure compresses models and reveals latent factors (factorization)
Important ideas#
Row rank equals column rank
$\operatorname{rank}(A)$ is the dimension of $\text{col}(A)$ and equals that of $\text{row}(A)$.
Rank via singular values
If $A=U\Sigma V^\top$, then $\operatorname{rank}(A)$ equals the number of nonzero singular values $\sigma_i$.
Rank–nullity theorem
For $A\in\mathbb{R}^{m\times d}$, $$\operatorname{rank}(A) + \operatorname{nullity}(A) = d.$$
Fundamental theorem of linear algebra (FTLA)
$\mathbb{R}^n = \text{col}(A) \oplus \text{null}(A^\top)$ and $\mathbb{R}^d = \text{row}(A) \oplus \text{null}(A)$ (orthogonal decompositions).
Rank of products and sums
$\operatorname{rank}(AB) \le \min\{\operatorname{rank}(A), \operatorname{rank}(B)\}$; subadditivity for sums.
Pseudoinverse $A^+$
Moore–Penrose $A^+$ gives minimal-norm solutions $x^* = A^+ b$; satisfies $AA^+A = A$.
Numerical rank
Practical rank uses thresholds on singular values to handle floating-point noise.
Relevance to ML#
Multicollinearity: rank-deficient design $X$ yields non-unique OLS solutions; regularization/pseudoinverse needed.
PCA/compression: low rank captures variance efficiently; truncation yields best rank-$k$ approximation.
Recommendation systems: user–item matrices modeled as low-rank factorization.
Kernels/Gram matrices: rank informs capacity and generalization; $\operatorname{rank}(XX^\top) \le \min(n,d)$.
Attention: score matrix $QK^\top$ has rank bounded by $\min(n, d_k)$; head dimension limits expressivity.
Deep nets: bottleneck layers enforce low-rank mapping; adapters/LoRA factorize weights.
Algorithmic development (milestones)#
1936: Eckart–Young — best rank-$k$ approximation via SVD.
1955: Penrose — Moore–Penrose pseudoinverse.
1990s–2000s: Matrix factorization in recommender systems (SVD-based, ALS).
2009: Candès–Recht — nuclear norm relaxation for matrix completion.
2011: Halko–Martinsson–Tropp — randomized SVD for large-scale low-rank.
2019–2021: Low-rank adapters (LoRA) compress transformer weights.
Definitions#
$\operatorname{rank}(A)$: dimension of $\text{col}(A)$ (or $\text{row}(A)$); number of nonzero singular values.
$\text{null}(A) = \{x: Ax=0\}$; $\text{null}(A^\top)$ similarly.
$\text{col}(A)$: span of columns; $\text{row}(A)$: span of rows.
FTLA decompositions: $\mathbb{R}^n = \text{col}(A) \oplus \text{null}(A^\top)$, $\mathbb{R}^d = \text{row}(A) \oplus \text{null}(A)$.
Pseudoinverse: $A^+ = V \Sigma^+ U^\top$ where $\Sigma^+$ reciprocates nonzero $\sigma_i$.
Comments