Linalg — 线性代数¶

概述¶

基础逐元素矩阵运算。TinyMath 依赖链中的 第 1 层——所有更高级的模块（decomp、eigen、iterative）都依赖这些原语。

头文件： #include "tiny_linalg.h"

架构¶

该模块提供三个层次的运算：

层次	函数	复杂度
逐元素算术	`add` / `sub` / `mult` ± 常数（支持 padding 和 stride）	\(O(r \cdot c)\)
矩阵乘法	`mult` / `mult_ex` / `matvec`	\(O(m \cdot n \cdot k)\)
工具函数	`transpose` / `eye` / `zero` / `fill` / `diag` / `frob_norm` / `has_nan` / `has_inf` / `residual_norm`	\(O(r \cdot c)\)

平台分发¶

所有函数在编译时自动选择最快的可用实现：

ESP32 平台： 将向量化运算委托给 ESP-DSP 库（dsps_*、dspm_*）
通用平台： 回退到纯 C 循环

通过 tiny_math_config.h 中的 MCU_PLATFORM_SELECTED 控制。

Padding 与 Stride 对性能的影响

padding = 0 且 stride = 1 时使用 ESP-DSP 加速路径。任何非零 padding 或非单位 stride 都会强制使用可能慢 10 倍 的通用 C 回退实现。

逐元素算术¶

add / sub / multaddc / subc / multc（标量）

tiny_error_t tiny_mat_add_f32(const float *a, const float *b, float *c,
                               int rows, int cols,
                               int pa, int pb, int pc,
                               int sa, int sb, int sc);
tiny_error_t tiny_mat_sub_f32(/* 相同签名 */);
tiny_error_t tiny_mat_mult_f32(/* 相同签名 */);

\(C_{ij} = A_{ij} \pm B_{ij}\) 或 \(C_{ij} = A_{ij} \cdot B_{ij}\)

参数	说明
`a, b, c`	输入/输出矩阵
`rows, cols`	逻辑矩阵维度
`pa, pb, pc`	行填充（每行不属于数据的元素数）
`sa, sb, sc`	列步长（通常为 `1`）

tiny_error_t tiny_mat_addc_f32(const float *in, float *out, float C,
                                int rows, int cols,
                                int pi, int po, int si, int so);
tiny_error_t tiny_mat_subc_f32(/* 相同签名 */);
tiny_error_t tiny_mat_multc_f32(/* 相同签名 */);

\(Out_{ij} = In_{ij} \pm C\) 或 \(Out_{ij} = In_{ij} \cdot C\)

矩阵乘法¶

基本乘法扩展乘法（支持 padding）矩阵-向量

tiny_error_t tiny_mat_mult_f32(const float *A, const float *B, float *C,
                                int m, int n, int k);

\(C_{m \times n} = A_{m \times k} \cdot B_{k \times n}\)

所有矩阵必须是 连续的（无 padding）。ESP32 上使用 ESP-DSP dspm_mult_f32。

tiny_error_t tiny_mat_mult_ex_f32(const float *A, const float *B, float *C,
                                   int Ar, int Ac, int Bc,
                                   int Ap, int Bp, int Cp);

相同乘法但每个操作数可以有 行 padding——适用于从较大矩阵中提取的子块相乘。

参数	说明
`Ar, Ac`	A 的行数、A 的列数（= B 的行数）
`Bc`	B 的列数
`Ap, Bp, Cp`	每个操作数的行填充

tiny_error_t tiny_mat_matvec_f32(const float *A, const float *x, float *y,
                                  int rows, int cols);

\(y_{rows} = A_{rows \times cols} \cdot x_{cols}\)——假设连续存储，无 padding。

float A[6] = {1,2,3,4,5,6};  // 2×3 矩阵
float x[3] = {1,0,1};
float y[2];
tiny_mat_matvec_f32(A, x, y, 2, 3);
// y = {1*1 + 2*0 + 3*1, 4*1 + 5*0 + 6*1} = {4, 10}

工具函数¶

矩阵创建变换与范数验证

tiny_error_t tiny_mat_eye_f32(float *A, int n);     // Iₙ 单位矩阵
tiny_error_t tiny_mat_zero_f32(float *A, int r, int c); // 全零
tiny_error_t tiny_mat_fill_f32(float *A, int r, int c, float val); // 常数填充
tiny_error_t tiny_mat_diag_f32(const float *A, float *d, int n); // 提取对角线

float I[9];
tiny_mat_eye_f32(I, 3);
// I = [1 0 0; 0 1 0; 0 0 1]

tiny_error_t tiny_mat_transpose_f32(const float *A, float *At, int rows, int cols);
float tiny_mat_frob_norm_f32(const float *A, int rows, int cols);

转置： \(A^T\)，缓存友好的逐行迭代，\(O(m \cdot n)\)
Frobenius 范数： \(\|A\|_F = \sqrt{\sum a_{ij}^2}\)

int tiny_mat_has_nan_f32(const float *A, int rows, int cols);  // 找到 NaN 返回 1
int tiny_mat_has_inf_f32(const float *A, int rows, int cols);  // 找到 Inf 返回 1
float tiny_mat_residual_norm_f32(const float *A, const float *B, int rows, int cols);

residual_norm 计算 \(\|A - B\|_F\)——对迭代求解器的收敛检查非常有用。

// 计算后检查矩阵是否有无效条目
if (tiny_mat_has_nan_f32(result, 10, 10)) {
    // 发生了除零或溢出
}

返回值¶

所有函数返回 tiny_error_t：

代码	含义
`TINY_OK`	成功
`TINY_ERR_MATH_NULL_POINTER`	传入了空指针
`TINY_ERR_MATH_INVALID_LENGTH`	维度 ≤ 0 或不匹配
`TINY_ERR_MATH_ZERO_DIVISION`	除零（如果适用）