Decomp — Matrix Decompositions¶
Overview¶
Two QR decomposition implementations: Modified Gram-Schmidt (MGS) and Householder QR (HQR). Level 3 in the TinyMath dependency chain.
Header: #include "tiny_decomp.h"
Which QR to Use?¶
| Criterion | MGS | HQR |
|---|---|---|
| Speed | Fast (\(O(m \cdot k \cdot n)\)) | Slower (\(O(m \cdot n^2)\)) |
| Orthogonality | Good (re-ortho option) | Excellent (backward stable) |
| Matrix shape | \(m \geq k\) (tall) | \(m \geq n\) (general) |
| In-place | Yes (overwrites \(A\) with \(Q\)) | No (separate \(Q\), \(R\) output) |
| Use case | ERA / ITD / Least-squares | SSI / SVD via bidiagonalization |
Rule of thumb: For ERA where speed matters and \(m \gg n\), use MGS with reorthogonalize = 1. For SSI where orthogonality is critical, use HQR.
MGS QR¶
Modified Gram-Schmidt computes \(A_{m \times k} = Q_{m \times k} \cdot R_{k \times k}\).
Workspace Size¶
Returns the required workspace in bytes. Currently returns 0 (no external workspace needed — the implementation uses local arrays).
Core Decomposition¶
tiny_error_t tiny_mgs_f32(float *A, int m, int k,
float *R,
int reorthogonalize,
void *ws, size_t ws_size);
Parameters:
| Param | Type | Description |
|---|---|---|
A | float * | Input/Output. On entry: \(m \times k\) matrix. On exit: overwritten with \(Q\) (row-major, column stride = \(k\)) |
m | int | Number of rows. Must satisfy \(m \geq k\) |
k | int | Number of columns (also the rank of \(R\)) |
R | float * | Output. \(k \times k\) upper-triangular matrix (row-major, step = \(k\)). Lower triangle is zeroed |
reorthogonalize | int | 0 = single pass, 1 = double pass (recommended). Improves orthogonality from \(\sim 10^{-4}\) to \(\sim 10^{-7}\) |
ws | void * | Workspace buffer (currently unused, may be NULL) |
ws_size | size_t | Size of workspace in bytes (currently unused) |
Returns: TINY_OK on success, or error code on invalid input.
Always Use Re-orthogonalization
Without re-orthogonalization, MGS in float32 loses orthogonality when column condition number exceeds \(10^3\). In ERA, \(H_0\) condition can exceed \(10^4\) — always set reorthogonalize = 1.
Orthogonality Check¶
Computes \(\|Q^T Q - I\|_F\) — a value near 0 indicates good orthogonality. Use this to verify numerical quality after decomposition.
Householder QR¶
Householder QR computes \(A_{m \times n} = Q_{m \times m} \cdot R_{m \times n}\) using reflection matrices.
Workspace Size¶
Returns the required workspace in bytes. Includes memory for intermediate reflection vectors and the accumulated \(Q\) matrix.
Core Decomposition¶
tiny_error_t tiny_hqr_f32(const float *A, int m, int n,
float *Q, float *R,
void *ws, size_t ws_size);
Parameters:
| Param | Type | Description |
|---|---|---|
A | const float * | Input only. \(m \times n\) matrix (not modified) |
m | int | Number of rows |
n | int | Number of columns |
Q | float * | Output. \(m \times m\) orthogonal matrix (row-major) |
R | float * | Output. \(m \times n\) upper-triangular matrix (row-major). Values below the diagonal are zeroed |
ws | void * | Workspace buffer. Must be at least tiny_hqr_f32_workspace_size(m, n) bytes |
ws_size | size_t | Size of workspace in bytes |
Returns: TINY_OK on success, or error code on failure.
Engineering Notes¶
How Each Method Works¶
MGS processes columns one at a time: 1. Copy column \(j\) of \(A\) 2. Project it onto each previously orthogonalized column \(q_0 \dots q_{j-1}\) 3. Subtract the projections → orthogonal to all previous columns 4. Normalize → this becomes \(q_j\)
HQR uses Householder reflections that zero out entire sub-diagonal blocks in one reflection — more efficient per reflection but requires \(O(m \cdot n^2)\) total.
Performance Guidance¶
| Problem Size | Recommendation |
|---|---|
| \(n \leq 8\) | Either — speed difference negligible |
| \(n \leq 50\), \(m \gg n\) | MGS with reorthogonalize = 1 (ERA/ITD) |
| \(n \leq 50\), general shape | HQR (SSI) |
| Precision-critical | HQR (backward stable, \(Q^T Q \approx I\) at machine precision) |
Memory Layout¶
Both algorithms assume row-major, contiguous storage (no padding, stride = columns). Non-contiguous matrices must be copied to contiguous buffers before calling.