Iterative — Krylov Methods & Randomized SVD¶

Overview¶

Krylov subspace methods (Arnoldi, Lanczos), Randomized SVD, and the linear operator interface. Level 5 in the TinyMath dependency chain — the highest level.

Header: #include "tiny_iterative.h"

Operator Interface¶

The operator abstraction lets iterative methods work with any linear transformation without storing the full matrix.

typedef struct {
    int rows, cols; void *ctx;
    tiny_error_t (*matvec)(const float *x, float *y, void *ctx);
    tiny_error_t (*rmatvec)(const float *x, float *y, void *ctx);
} tiny_operator_f32_t;

Pre-built Operators¶

Init Function	Matrix Structure	Storage	Use Case
`tiny_dense_operator_init`	General contiguous matrix	$O(m \cdot n)$	General purpose
`tiny_hankel_operator_init`	Hankel from signal $s(t)$	$O(N)$	ERA/SSI correlation (implicit)
`tiny_block_hankel_operator_init`	Block Hankel from $R_{lag}[ch][ch]$	$O(n_{ch}^2 \cdot N_{lags})$	SSI Toeplitz-free
`tiny_covariance_operator_init`	Covariance from eigenvalues	$O(n_{ch} \cdot i_{max})$	SSI covariance

Block-Hankel Memory Savings

Instead of constructing a $T_{dim}^2$ Toeplitz matrix ($O(n^2)$ memory), the operator computes matvec on-the-fly ($O(N \cdot n)$ time). For $T_{dim} = 100$, this saves 40 KB of RAM.

Arnoldi Iteration¶

size_t       tiny_arnoldi_f32_workspace_size(int n, int k);
tiny_error_t tiny_arnoldi_f32(const tiny_operator_f32_t *op,
                               const float *v0, int n, int k,
                               float *V, float *H,
                               void *ws, size_t ws_size);

Arnoldi = MGS-based Krylov subspace construction. Given operator $A$ and starting vector $v_0$, computes $V_k = [v_1, ..., v_k]$ spanning $\mathcal{K}_k(A, v_0)$, plus Hessenberg $H_k$.

Parameters:

Param	Type	Description
`op`	`const tiny_operator_f32_t *`	Linear operator (matvec)
`v0`	`const float *`	Starting vector, length $n$
`n`	`int`	Problem dimension
`k`	`int`	Krylov subspace dimension (number of iterations)
`V`	`float *`	Output. $n \times k$ basis, column-major (`V[i + j*n]` = column $j$)
`H`	`float *`	Output. $k \times k$ upper Hessenberg (row-major)
`ws`	`void *`	Workspace buffer
`ws_size`	`size_t`	Workspace size in bytes

Returns: TINY_OK on success.

Eigenvalues of $H_k$ approximate extreme eigenvalues of $A$ (Ritz values).

Use cases: Subspace iteration for top-$k$ eigenvalues, sparse eigenvalue problems.

Lanczos Iteration¶

size_t       tiny_lanczos_f32_workspace_size(int n, int k);
tiny_error_t tiny_lanczos_f32(const tiny_operator_f32_t *op,
                               const float *v0, int n, int k,
                               float *alpha, float *beta, float *V,
                               void *ws, size_t ws_size);

Lanczos = Arnoldi for symmetric operators ($A = A^T$). Produces a tridiagonal matrix instead of full Hessenberg.

Parameters:

Param	Type	Description
`op`	`const tiny_operator_f32_t *`	Linear operator (must be symmetric)
`v0`	`const float *`	Starting vector, length $n$
`n`	`int`	Problem dimension
`k`	`int$	Krylov subspace dimension
`alpha`	`float *`	Output. Diagonal elements ($k$ values)
`beta`	`float *`	Output. Off-diagonal elements ($k$ values)
`V`	`float *`	Output. $n \times k$ basis, column-major
`ws`	`void *`	Workspace buffer
`ws_size`	`size_t`	Workspace size in bytes

Performance: 2× faster than Arnoldi (only one MGS orthogonalization per step).

Loss of Orthogonality

In float32, Lanczos loses orthogonality after $\sim 10$ iterations due to round-off. Re-orthogonalization is not built-in; use Arnoldi if orthogonality is critical.

Randomized SVD¶

Approximates $A \approx U \cdot \Sigma \cdot V^T$ using random projections (Halko, 2011).

Algorithm¶

1. Ω = random n × (k+p) Gaussian matrix
2. Y = A · Ω                    — sketch (m × (k+p))
3. Y = Q · R                    — QR of sketch
4. B = Q^T · A                  — project to (k+p) × n
5. B = Ũ · Σ · V^T              — full SVD of small B
6. U = Q · Ũ,  Σ,  V           — reconstruct

Dense Matrix¶

size_t       tiny_rsvd_dense_f32_workspace_size(int m, int n, int k, int p, int q);
tiny_error_t tiny_rsvd_dense_f32(const float *A, int m, int n, int k, int p, int q,
                                   float *U, float *S, float *Vt,
                                   void *ws, size_t ws_size);

Operator Form¶

size_t       tiny_rsvd_f32_workspace_size(int m, int n, int k, int p, int q);
tiny_error_t tiny_rsvd_f32(const tiny_operator_f32_t *op, int m, int n, int k,
                            float *U, float *S, float *Vt,
                            void *ws, size_t ws_size);

Parameters:

Param	Type	Description
`A` / `op`	—	Input matrix or operator
`m`, `n`	`int`	Matrix dimensions
`k`	`int`	Target rank (number of singular values/vectors)
`p`	`int`	Oversampling (default 5). Extra columns beyond $k$
`q`	`int`	Power iterations (default 1). Each pass improves accuracy for slow decay
`U`	`float *`	Output. $m \times k$ left singular vectors (column-major)
`S`	`float *`	Output. $k$ singular values (descending)
`Vt`	`float *`	Output. $k \times n$ right singular vectors (row-major)
`ws`	`void *`	Workspace buffer
`ws_size`	`size_t`	Workspace size in bytes

Workspace: $\sim (m \cdot r + n \cdot r + r^2)$ floats where $r = k + p$. For $m=75, n=50, k=10$: $\sim 8$ KB.

RSVD Performance

RSVD with $k=10, p=5, q=1$ gives SVD accuracy comparable to full Jacobi at 10× speed for $50 \times 50$ matrices. Used as the primary SVD in the pairwise ERA implementation.

SVD Utilities¶

void  tiny_svd_cumulative_energy_f32(const float *S, int len, float *energy_out);
int   tiny_svd_rank_by_energy_f32(const float *S, int len, float threshold);
int   tiny_svd_rank_by_threshold_f32(const float *S, int len, float tol);
float tiny_svd_condition_estimate_f32(const float *S, int len);

Function	Purpose	Example
`cumulative_energy`	$\text{energy}[i] = \frac{\sum_{j=0}^i S_j^2}{\sum S^2}$	Track how many SVs explain 99% energy
`rank_by_energy`	First index where cumulative energy $\geq$ threshold	`rank_by_energy(S, len, 0.99)` → model order
`rank_by_threshold`	Count $S_k > \text{tol} \cdot S_0$	`rank_by_threshold(S, len, 1e-3)` → numerical rank
`condition_estimate`	$S_0 / S_{n-1}$	Condition number $\kappa$ — if $> 10^6$, ill-conditioned

Engineering Notes¶

Random seed: RSVD uses a Box-Muller Gaussian RNG with static state. Deterministic between boot cycles.
Power iteration: Each $q$-pass amplifies singular values close to rapid decay. For most SHM data with exponential decay, $q=1$ is sufficient.
Operator vs dense: If the matrix fits in RAM ($< 100 \times 100$), use tiny_rsvd_dense_f32. The operator form only helps for matrices too large to store explicitly (e.g., Toeplitz as an operator).

Init Function	Matrix Structure	Storage	Use Case
`tiny_dense_operator_init`	General contiguous matrix	\(O(m \cdot n)\)	General purpose
`tiny_hankel_operator_init`	Hankel from signal \(s(t)\)	\(O(N)\)	ERA/SSI correlation (implicit)
`tiny_block_hankel_operator_init`	Block Hankel from \(R_{lag}[ch][ch]\)	\(O(n_{ch}^2 \cdot N_{lags})\)	SSI Toeplitz-free
`tiny_covariance_operator_init`	Covariance from eigenvalues	\(O(n_{ch} \cdot i_{max})\)	SSI covariance

Function	Purpose	Example
`cumulative_energy`	\(\text{energy}[i] = \frac{\sum_{j=0}^i S_j^2}{\sum S^2}\)	Track how many SVs explain 99% energy
`rank_by_energy`	First index where cumulative energy \(\geq\) threshold	`rank_by_energy(S, len, 0.99)` → model order
`rank_by_threshold`	Count \(S_k > \text{tol} \cdot S_0\)	`rank_by_threshold(S, len, 1e-3)` → numerical rank
`condition_estimate`	\(S_0 / S_{n-1}\)	Condition number \(\kappa\) — if \(> 10^6\), ill-conditioned