Skip to content

Iterative — Krylov Methods & Randomized SVD

Overview

Krylov subspace methods (Arnoldi, Lanczos), Randomized SVD, and the linear operator interface. Level 5 in the TinyMath dependency chain — the highest level.

Header: #include "tiny_iterative.h"


Operator Interface

The operator abstraction lets iterative methods work with any linear transformation without storing the full matrix.

typedef struct {
    int rows, cols; void *ctx;
    tiny_error_t (*matvec)(const float *x, float *y, void *ctx);
    tiny_error_t (*rmatvec)(const float *x, float *y, void *ctx);
} tiny_operator_f32_t;

Pre-built Operators

Init Function Matrix Structure Storage Use Case
tiny_dense_operator_init General contiguous matrix \(O(m \cdot n)\) General purpose
tiny_hankel_operator_init Hankel from signal \(s(t)\) \(O(N)\) ERA/SSI correlation (implicit)
tiny_block_hankel_operator_init Block Hankel from \(R_{lag}[ch][ch]\) \(O(n_{ch}^2 \cdot N_{lags})\) SSI Toeplitz-free
tiny_covariance_operator_init Covariance from eigenvalues \(O(n_{ch} \cdot i_{max})\) SSI covariance

Block-Hankel Memory Savings

Instead of constructing a \(T_{dim}^2\) Toeplitz matrix (\(O(n^2)\) memory), the operator computes matvec on-the-fly (\(O(N \cdot n)\) time). For \(T_{dim} = 100\), this saves 40 KB of RAM.


Arnoldi Iteration

size_t       tiny_arnoldi_f32_workspace_size(int n, int k);
tiny_error_t tiny_arnoldi_f32(const tiny_operator_f32_t *op,
                               const float *v0, int n, int k,
                               float *V, float *H,
                               void *ws, size_t ws_size);

Arnoldi = MGS-based Krylov subspace construction. Given operator \(A\) and starting vector \(v_0\), computes \(V_k = [v_1, ..., v_k]\) spanning \(\mathcal{K}_k(A, v_0)\), plus Hessenberg \(H_k\).

Parameters:

Param Type Description
op const tiny_operator_f32_t * Linear operator (matvec)
v0 const float * Starting vector, length \(n\)
n int Problem dimension
k int Krylov subspace dimension (number of iterations)
V float * Output. \(n \times k\) basis, column-major (V[i + j*n] = column \(j\))
H float * Output. \(k \times k\) upper Hessenberg (row-major)
ws void * Workspace buffer
ws_size size_t Workspace size in bytes

Returns: TINY_OK on success.

Eigenvalues of \(H_k\) approximate extreme eigenvalues of \(A\) (Ritz values).

Use cases: Subspace iteration for top-\(k\) eigenvalues, sparse eigenvalue problems.


Lanczos Iteration

size_t       tiny_lanczos_f32_workspace_size(int n, int k);
tiny_error_t tiny_lanczos_f32(const tiny_operator_f32_t *op,
                               const float *v0, int n, int k,
                               float *alpha, float *beta, float *V,
                               void *ws, size_t ws_size);

Lanczos = Arnoldi for symmetric operators (\(A = A^T\)). Produces a tridiagonal matrix instead of full Hessenberg.

Parameters:

Param Type Description
op const tiny_operator_f32_t * Linear operator (must be symmetric)
v0 const float * Starting vector, length \(n\)
n int Problem dimension
k `int$ Krylov subspace dimension
alpha float * Output. Diagonal elements (\(k\) values)
beta float * Output. Off-diagonal elements (\(k\) values)
V float * Output. \(n \times k\) basis, column-major
ws void * Workspace buffer
ws_size size_t Workspace size in bytes

Performance: 2× faster than Arnoldi (only one MGS orthogonalization per step).

Loss of Orthogonality

In float32, Lanczos loses orthogonality after \(\sim 10\) iterations due to round-off. Re-orthogonalization is not built-in; use Arnoldi if orthogonality is critical.


Randomized SVD

Approximates \(A \approx U \cdot \Sigma \cdot V^T\) using random projections (Halko, 2011).

Algorithm

1. Ω = random n × (k+p) Gaussian matrix
2. Y = A · Ω                    — sketch (m × (k+p))
3. Y = Q · R                    — QR of sketch
4. B = Q^T · A                  — project to (k+p) × n
5. B = Ũ · Σ · V^T              — full SVD of small B
6. U = Q · Ũ,  Σ,  V           — reconstruct

Dense Matrix

size_t       tiny_rsvd_dense_f32_workspace_size(int m, int n, int k, int p, int q);
tiny_error_t tiny_rsvd_dense_f32(const float *A, int m, int n, int k, int p, int q,
                                   float *U, float *S, float *Vt,
                                   void *ws, size_t ws_size);

Operator Form

size_t       tiny_rsvd_f32_workspace_size(int m, int n, int k, int p, int q);
tiny_error_t tiny_rsvd_f32(const tiny_operator_f32_t *op, int m, int n, int k,
                            float *U, float *S, float *Vt,
                            void *ws, size_t ws_size);

Parameters:

Param Type Description
A / op Input matrix or operator
m, n int Matrix dimensions
k int Target rank (number of singular values/vectors)
p int Oversampling (default 5). Extra columns beyond \(k\)
q int Power iterations (default 1). Each pass improves accuracy for slow decay
U float * Output. \(m \times k\) left singular vectors (column-major)
S float * Output. \(k\) singular values (descending)
Vt float * Output. \(k \times n\) right singular vectors (row-major)
ws void * Workspace buffer
ws_size size_t Workspace size in bytes

Workspace: \(\sim (m \cdot r + n \cdot r + r^2)\) floats where \(r = k + p\). For \(m=75, n=50, k=10\): \(\sim 8\) KB.

RSVD Performance

RSVD with \(k=10, p=5, q=1\) gives SVD accuracy comparable to full Jacobi at 10× speed for \(50 \times 50\) matrices. Used as the primary SVD in the pairwise ERA implementation.


SVD Utilities

void  tiny_svd_cumulative_energy_f32(const float *S, int len, float *energy_out);
int   tiny_svd_rank_by_energy_f32(const float *S, int len, float threshold);
int   tiny_svd_rank_by_threshold_f32(const float *S, int len, float tol);
float tiny_svd_condition_estimate_f32(const float *S, int len);
Function Purpose Example
cumulative_energy \(\text{energy}[i] = \frac{\sum_{j=0}^i S_j^2}{\sum S^2}\) Track how many SVs explain 99% energy
rank_by_energy First index where cumulative energy \(\geq\) threshold rank_by_energy(S, len, 0.99) → model order
rank_by_threshold Count \(S_k > \text{tol} \cdot S_0\) rank_by_threshold(S, len, 1e-3) → numerical rank
condition_estimate \(S_0 / S_{n-1}\) Condition number \(\kappa\) — if \(> 10^6\), ill-conditioned

Engineering Notes

  • Random seed: RSVD uses a Box-Muller Gaussian RNG with static state. Deterministic between boot cycles.
  • Power iteration: Each \(q\)-pass amplifies singular values close to rapid decay. For most SHM data with exponential decay, \(q=1\) is sufficient.
  • Operator vs dense: If the matrix fits in RAM (\(< 100 \times 100\)), use tiny_rsvd_dense_f32. The operator form only helps for matrices too large to store explicitly (e.g., Toeplitz as an operator).