Decomp — Matrix Decompositions¶

Overview¶

Two QR decomposition implementations: Modified Gram-Schmidt (MGS) and Householder QR (HQR). Level 3 in the TinyMath dependency chain.

Header: #include "tiny_decomp.h"

Which QR to Use?¶

Criterion	MGS	HQR
Speed	Fast (\(O(m \cdot k \cdot n)\))	Slower (\(O(m \cdot n^2)\))
Orthogonality	Good (re-ortho option)	Excellent (backward stable)
Matrix shape	\(m \geq k\) (tall)	\(m \geq n\) (general)
In-place	Yes (overwrites \(A\) with \(Q\))	No (separate \(Q\), \(R\) output)
Use case	ERA / ITD / Least-squares	SSI / SVD via bidiagonalization

Rule of thumb: For ERA where speed matters and \(m \gg n\), use MGS with reorthogonalize = 1. For SSI where orthogonality is critical, use HQR.

MGS QR¶

Modified Gram-Schmidt computes \(A_{m \times k} = Q_{m \times k} \cdot R_{k \times k}\).

Workspace Size¶

size_t tiny_mgs_f32_workspace_size(int m, int k);

Returns the required workspace in bytes. Currently returns 0 (no external workspace needed — the implementation uses local arrays).

Core Decomposition¶

tiny_error_t tiny_mgs_f32(float *A, int m, int k,
                           float *R,
                           int reorthogonalize,
                           void *ws, size_t ws_size);

Parameters:

Param	Type	Description
`A`	`float *`	Input/Output. On entry: \(m \times k\) matrix. On exit: overwritten with \(Q\) (row-major, column stride = \(k\))
`m`	`int`	Number of rows. Must satisfy \(m \geq k\)
`k`	`int`	Number of columns (also the rank of \(R\))
`R`	`float *`	Output. \(k \times k\) upper-triangular matrix (row-major, step = \(k\)). Lower triangle is zeroed
`reorthogonalize`	`int`	`0` = single pass, `1` = double pass (recommended). Improves orthogonality from \(\sim 10^{-4}\) to \(\sim 10^{-7}\)
`ws`	`void *`	Workspace buffer (currently unused, may be `NULL`)
`ws_size`	`size_t`	Size of workspace in bytes (currently unused)

Returns: TINY_OK on success, or error code on invalid input.

Always Use Re-orthogonalization

Without re-orthogonalization, MGS in float32 loses orthogonality when column condition number exceeds \(10^3\). In ERA, \(H_0\) condition can exceed \(10^4\) — always set reorthogonalize = 1.

Orthogonality Check¶

float tiny_mgs_check_orthogonality_f32(const float *Q, int m, int k);

Computes \(\|Q^T Q - I\|_F\) — a value near 0 indicates good orthogonality. Use this to verify numerical quality after decomposition.

Householder QR¶

Householder QR computes \(A_{m \times n} = Q_{m \times m} \cdot R_{m \times n}\) using reflection matrices.

Workspace Size¶

size_t tiny_hqr_f32_workspace_size(int m, int n);

Returns the required workspace in bytes. Includes memory for intermediate reflection vectors and the accumulated \(Q\) matrix.

Core Decomposition¶

tiny_error_t tiny_hqr_f32(const float *A, int m, int n,
                           float *Q, float *R,
                           void *ws, size_t ws_size);

Parameters:

Param	Type	Description
`A`	`const float *`	Input only. \(m \times n\) matrix (not modified)
`m`	`int`	Number of rows
`n`	`int`	Number of columns
`Q`	`float *`	Output. \(m \times m\) orthogonal matrix (row-major)
`R`	`float *`	Output. \(m \times n\) upper-triangular matrix (row-major). Values below the diagonal are zeroed
`ws`	`void *`	Workspace buffer. Must be at least `tiny_hqr_f32_workspace_size(m, n)` bytes
`ws_size`	`size_t`	Size of workspace in bytes

Returns: TINY_OK on success, or error code on failure.

Engineering Notes¶

How Each Method Works¶

MGS processes columns one at a time: 1. Copy column \(j\) of \(A\) 2. Project it onto each previously orthogonalized column \(q_0 \dots q_{j-1}\) 3. Subtract the projections → orthogonal to all previous columns 4. Normalize → this becomes \(q_j\)

HQR uses Householder reflections that zero out entire sub-diagonal blocks in one reflection — more efficient per reflection but requires \(O(m \cdot n^2)\) total.

Performance Guidance¶

Problem Size	Recommendation
\(n \leq 8\)	Either — speed difference negligible
\(n \leq 50\), \(m \gg n\)	MGS with `reorthogonalize = 1` (ERA/ITD)
\(n \leq 50\), general shape	HQR (SSI)
Precision-critical	HQR (backward stable, \(Q^T Q \approx I\) at machine precision)

Memory Layout¶

Both algorithms assume row-major, contiguous storage (no padding, stride = columns). Non-contiguous matrices must be copied to contiguous buffers before calling.