Iterative — Krylov Methods & Randomized SVD¶
Overview¶
Krylov subspace methods (Arnoldi, Lanczos), Randomized SVD, and the linear operator interface. Level 5 in the TinyMath dependency chain — the highest level.
Header: #include "tiny_iterative.h"
Operator Interface¶
The operator abstraction lets iterative methods work with any linear transformation without storing the full matrix.
typedef struct {
int rows, cols; void *ctx;
tiny_error_t (*matvec)(const float *x, float *y, void *ctx);
tiny_error_t (*rmatvec)(const float *x, float *y, void *ctx);
} tiny_operator_f32_t;
Pre-built Operators¶
| Init Function | Matrix Structure | Storage | Use Case |
|---|---|---|---|
tiny_dense_operator_init | General contiguous matrix | \(O(m \cdot n)\) | General purpose |
tiny_hankel_operator_init | Hankel from signal \(s(t)\) | \(O(N)\) | ERA/SSI correlation (implicit) |
tiny_block_hankel_operator_init | Block Hankel from \(R_{lag}[ch][ch]\) | \(O(n_{ch}^2 \cdot N_{lags})\) | SSI Toeplitz-free |
tiny_covariance_operator_init | Covariance from eigenvalues | \(O(n_{ch} \cdot i_{max})\) | SSI covariance |
Block-Hankel Memory Savings
Instead of constructing a \(T_{dim}^2\) Toeplitz matrix (\(O(n^2)\) memory), the operator computes matvec on-the-fly (\(O(N \cdot n)\) time). For \(T_{dim} = 100\), this saves 40 KB of RAM.
Arnoldi Iteration¶
size_t tiny_arnoldi_f32_workspace_size(int n, int k);
tiny_error_t tiny_arnoldi_f32(const tiny_operator_f32_t *op,
const float *v0, int n, int k,
float *V, float *H,
void *ws, size_t ws_size);
Arnoldi = MGS-based Krylov subspace construction. Given operator \(A\) and starting vector \(v_0\), computes \(V_k = [v_1, ..., v_k]\) spanning \(\mathcal{K}_k(A, v_0)\), plus Hessenberg \(H_k\).
Parameters:
| Param | Type | Description |
|---|---|---|
op | const tiny_operator_f32_t * | Linear operator (matvec) |
v0 | const float * | Starting vector, length \(n\) |
n | int | Problem dimension |
k | int | Krylov subspace dimension (number of iterations) |
V | float * | Output. \(n \times k\) basis, column-major (V[i + j*n] = column \(j\)) |
H | float * | Output. \(k \times k\) upper Hessenberg (row-major) |
ws | void * | Workspace buffer |
ws_size | size_t | Workspace size in bytes |
Returns: TINY_OK on success.
Eigenvalues of \(H_k\) approximate extreme eigenvalues of \(A\) (Ritz values).
Use cases: Subspace iteration for top-\(k\) eigenvalues, sparse eigenvalue problems.
Lanczos Iteration¶
size_t tiny_lanczos_f32_workspace_size(int n, int k);
tiny_error_t tiny_lanczos_f32(const tiny_operator_f32_t *op,
const float *v0, int n, int k,
float *alpha, float *beta, float *V,
void *ws, size_t ws_size);
Lanczos = Arnoldi for symmetric operators (\(A = A^T\)). Produces a tridiagonal matrix instead of full Hessenberg.
Parameters:
| Param | Type | Description |
|---|---|---|
op | const tiny_operator_f32_t * | Linear operator (must be symmetric) |
v0 | const float * | Starting vector, length \(n\) |
n | int | Problem dimension |
k | `int$ | Krylov subspace dimension |
alpha | float * | Output. Diagonal elements (\(k\) values) |
beta | float * | Output. Off-diagonal elements (\(k\) values) |
V | float * | Output. \(n \times k\) basis, column-major |
ws | void * | Workspace buffer |
ws_size | size_t | Workspace size in bytes |
Performance: 2× faster than Arnoldi (only one MGS orthogonalization per step).
Loss of Orthogonality
In float32, Lanczos loses orthogonality after \(\sim 10\) iterations due to round-off. Re-orthogonalization is not built-in; use Arnoldi if orthogonality is critical.
Randomized SVD¶
Approximates \(A \approx U \cdot \Sigma \cdot V^T\) using random projections (Halko, 2011).
Algorithm¶
1. Ω = random n × (k+p) Gaussian matrix
2. Y = A · Ω — sketch (m × (k+p))
3. Y = Q · R — QR of sketch
4. B = Q^T · A — project to (k+p) × n
5. B = Ũ · Σ · V^T — full SVD of small B
6. U = Q · Ũ, Σ, V — reconstruct
Dense Matrix¶
size_t tiny_rsvd_dense_f32_workspace_size(int m, int n, int k, int p, int q);
tiny_error_t tiny_rsvd_dense_f32(const float *A, int m, int n, int k, int p, int q,
float *U, float *S, float *Vt,
void *ws, size_t ws_size);
Operator Form¶
size_t tiny_rsvd_f32_workspace_size(int m, int n, int k, int p, int q);
tiny_error_t tiny_rsvd_f32(const tiny_operator_f32_t *op, int m, int n, int k,
float *U, float *S, float *Vt,
void *ws, size_t ws_size);
Parameters:
| Param | Type | Description |
|---|---|---|
A / op | — | Input matrix or operator |
m, n | int | Matrix dimensions |
k | int | Target rank (number of singular values/vectors) |
p | int | Oversampling (default 5). Extra columns beyond \(k\) |
q | int | Power iterations (default 1). Each pass improves accuracy for slow decay |
U | float * | Output. \(m \times k\) left singular vectors (column-major) |
S | float * | Output. \(k\) singular values (descending) |
Vt | float * | Output. \(k \times n\) right singular vectors (row-major) |
ws | void * | Workspace buffer |
ws_size | size_t | Workspace size in bytes |
Workspace: \(\sim (m \cdot r + n \cdot r + r^2)\) floats where \(r = k + p\). For \(m=75, n=50, k=10\): \(\sim 8\) KB.
RSVD Performance
RSVD with \(k=10, p=5, q=1\) gives SVD accuracy comparable to full Jacobi at 10× speed for \(50 \times 50\) matrices. Used as the primary SVD in the pairwise ERA implementation.
SVD Utilities¶
void tiny_svd_cumulative_energy_f32(const float *S, int len, float *energy_out);
int tiny_svd_rank_by_energy_f32(const float *S, int len, float threshold);
int tiny_svd_rank_by_threshold_f32(const float *S, int len, float tol);
float tiny_svd_condition_estimate_f32(const float *S, int len);
| Function | Purpose | Example |
|---|---|---|
cumulative_energy | \(\text{energy}[i] = \frac{\sum_{j=0}^i S_j^2}{\sum S^2}\) | Track how many SVs explain 99% energy |
rank_by_energy | First index where cumulative energy \(\geq\) threshold | rank_by_energy(S, len, 0.99) → model order |
rank_by_threshold | Count \(S_k > \text{tol} \cdot S_0\) | rank_by_threshold(S, len, 1e-3) → numerical rank |
condition_estimate | \(S_0 / S_{n-1}\) | Condition number \(\kappa\) — if \(> 10^6\), ill-conditioned |
Engineering Notes¶
- Random seed: RSVD uses a Box-Muller Gaussian RNG with static state. Deterministic between boot cycles.
- Power iteration: Each \(q\)-pass amplifies singular values close to rapid decay. For most SHM data with exponential decay, \(q=1\) is sufficient.
- Operator vs dense: If the matrix fits in RAM (\(< 100 \times 100\)), use
tiny_rsvd_dense_f32. The operator form only helps for matrices too large to store explicitly (e.g., Toeplitz as an operator).