Skip to content

Decomp — Matrix Decompositions

Overview

Two QR decomposition implementations: Modified Gram-Schmidt (MGS) and Householder QR (HQR). Level 3 in the TinyMath dependency chain.

Header: #include "tiny_decomp.h"


Which QR to Use?

Criterion MGS HQR
Speed Fast (\(O(m \cdot k \cdot n)\)) Slower (\(O(m \cdot n^2)\))
Orthogonality Good (re-ortho option) Excellent (backward stable)
Matrix shape \(m \geq k\) (tall) \(m \geq n\) (general)
In-place Yes (overwrites \(A\) with \(Q\)) No (separate \(Q\), \(R\) output)
Use case ERA / ITD / Least-squares SSI / SVD via bidiagonalization

Rule of thumb: For ERA where speed matters and \(m \gg n\), use MGS with reorthogonalize = 1. For SSI where orthogonality is critical, use HQR.


MGS QR

Modified Gram-Schmidt computes \(A_{m \times k} = Q_{m \times k} \cdot R_{k \times k}\).

Workspace Size

size_t tiny_mgs_f32_workspace_size(int m, int k);

Returns the required workspace in bytes. Currently returns 0 (no external workspace needed — the implementation uses local arrays).

Core Decomposition

tiny_error_t tiny_mgs_f32(float *A, int m, int k,
                           float *R,
                           int reorthogonalize,
                           void *ws, size_t ws_size);

Parameters:

Param Type Description
A float * Input/Output. On entry: \(m \times k\) matrix. On exit: overwritten with \(Q\) (row-major, column stride = \(k\))
m int Number of rows. Must satisfy \(m \geq k\)
k int Number of columns (also the rank of \(R\))
R float * Output. \(k \times k\) upper-triangular matrix (row-major, step = \(k\)). Lower triangle is zeroed
reorthogonalize int 0 = single pass, 1 = double pass (recommended). Improves orthogonality from \(\sim 10^{-4}\) to \(\sim 10^{-7}\)
ws void * Workspace buffer (currently unused, may be NULL)
ws_size size_t Size of workspace in bytes (currently unused)

Returns: TINY_OK on success, or error code on invalid input.

Always Use Re-orthogonalization

Without re-orthogonalization, MGS in float32 loses orthogonality when column condition number exceeds \(10^3\). In ERA, \(H_0\) condition can exceed \(10^4\)always set reorthogonalize = 1.

Orthogonality Check

float tiny_mgs_check_orthogonality_f32(const float *Q, int m, int k);

Computes \(\|Q^T Q - I\|_F\) — a value near 0 indicates good orthogonality. Use this to verify numerical quality after decomposition.


Householder QR

Householder QR computes \(A_{m \times n} = Q_{m \times m} \cdot R_{m \times n}\) using reflection matrices.

Workspace Size

size_t tiny_hqr_f32_workspace_size(int m, int n);

Returns the required workspace in bytes. Includes memory for intermediate reflection vectors and the accumulated \(Q\) matrix.

Core Decomposition

tiny_error_t tiny_hqr_f32(const float *A, int m, int n,
                           float *Q, float *R,
                           void *ws, size_t ws_size);

Parameters:

Param Type Description
A const float * Input only. \(m \times n\) matrix (not modified)
m int Number of rows
n int Number of columns
Q float * Output. \(m \times m\) orthogonal matrix (row-major)
R float * Output. \(m \times n\) upper-triangular matrix (row-major). Values below the diagonal are zeroed
ws void * Workspace buffer. Must be at least tiny_hqr_f32_workspace_size(m, n) bytes
ws_size size_t Size of workspace in bytes

Returns: TINY_OK on success, or error code on failure.


Engineering Notes

How Each Method Works

MGS processes columns one at a time: 1. Copy column \(j\) of \(A\) 2. Project it onto each previously orthogonalized column \(q_0 \dots q_{j-1}\) 3. Subtract the projections → orthogonal to all previous columns 4. Normalize → this becomes \(q_j\)

HQR uses Householder reflections that zero out entire sub-diagonal blocks in one reflection — more efficient per reflection but requires \(O(m \cdot n^2)\) total.

Performance Guidance

Problem Size Recommendation
\(n \leq 8\) Either — speed difference negligible
\(n \leq 50\), \(m \gg n\) MGS with reorthogonalize = 1 (ERA/ITD)
\(n \leq 50\), general shape HQR (SSI)
Precision-critical HQR (backward stable, \(Q^T Q \approx I\) at machine precision)

Memory Layout

Both algorithms assume row-major, contiguous storage (no padding, stride = columns). Non-contiguous matrices must be copied to contiguous buffers before calling.