Linalg — Implementation¶
File Structure¶
linalg/
├── tiny_linalg.h (36 lines — public API declarations)
└── tiny_linalg.c (516 lines — implementation)
Dependencies: tiny_math_config.h → tiny_constants.h, plus <math.h> and <string.h>.
Design Pattern¶
Every function follows a consistent pattern:
- Validate inputs (null pointers, dimensions)
- Dispatch to ESP-DSP if conditions are met (padding=0, stride=1, ESP32 platform)
- Fallback to generic C loop for all other cases
Example: tiny_mat_add_f32¶
tiny_error_t tiny_mat_add_f32(const float *input1, const float *input2,
float *output, int rows, int cols,
int padd1, int padd2, int padd_out,
int stride1, int stride2, int stride_out)
{
// 1. Validate
if (NULL == input1 || NULL == input2 || NULL == output)
return TINY_ERR_MATH_NULL_POINTER;
// 2. Platform dispatch
#if MCU_PLATFORM_SELECTED == MCU_PLATFORM_ESP32
if (padd1 == 0 && padd2 == 0 && padd_out == 0 &&
stride1 == 1 && stride2 == 1 && stride_out == 1) {
dspm_add_f32(input1, input2, output, rows, cols,
0, 0, 0, 1, 1, 1);
return TINY_OK;
}
#endif
// 3. Generic fallback — row-major with stride
const int in1_step = cols + padd1;
const int in2_step = cols + padd2;
const int out_step = cols + padd_out;
for (int row = 0; row < rows; row++) {
int base_in1 = row * in1_step;
int base_in2 = row * in2_step;
int base_out = row * out_step;
for (int col = 0; col < cols; col++) {
int idx_in1 = base_in1 + col * stride1;
int idx_in2 = base_in2 + col * stride2;
int idx_out = base_out + col * stride_out;
output[idx_out] = input1[idx_in1] + input2[idx_in2];
}
}
return TINY_OK;
}
Memory Layout Model¶
TinyMath matrices are stored in row-major format with optional padding:
Row-major (no padding): Row-major (padding=2):
┌───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┐
│ a │ b │ c │ d │ ← cols=4 │ a │ b │ c │ d │ - │ - │
├───┼───┼───┼───┤ ├───┼───┼───┼───┼───┼───┤
│ e │ f │ g │ h │ │ e │ f │ g │ h │ - │ - │
└───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┘
cols=4, step=4 cols=4, step=6
The storage index for element \((i, j)\) is:
Why Padding?
Padding exists for compatibility with ESP-DSP and other SIMD-optimised libraries that require power-of-two row alignment for vectorised loads. In most TinySHM workflows (small matrices, n ≤ 50), padding = 0 is optimal.
Platform Dispatch Mechanism¶
#if MCU_PLATFORM_SELECTED == MCU_PLATFORM_ESP32
// ── Accelerated path ──
// Requires: contiguous storage (padding=0, stride=1)
dspm_add_f32(...); // ESP-DSP, 2-5× faster
#else
// ── Generic C fallback ──
for (...) output[i] = a[i] + b[i]; // portable, slow
#endif
The same pattern is used in sub, mult, addc, subc, and multc. The accelerated path is only taken when all matrices are dense (no padding, unit stride).
Function Groups by Implementation Strategy¶
| Group | Validation | ESP-DSP Path | Generic Algorithm |
|---|---|---|---|
| add / sub / mult | null ptr, dims | dspm_add/sub/mul_f32 | Row-major loop with stride |
| addc / subc / multc | null ptr, dims | dsps_add/sub/mulc_f32 | Loop with scalar operand |
| mult | null ptr, dims | dspm_mult_f32 | Triple-nested loop \(O(mnk)\) |
| mult_ex | null ptr, dims | — | Triple-nested with padding offsets |
| matvec | null ptr, dims | dsps_dotprod_f32 per row | Dot product loop |
| transpose | null ptr, dims | — | Cache-friendly row iteration |
| eye / zero / fill | null ptr, dims | memset | Simple C loop |
| hasnan / hasinf | null ptr | — | isnan() / isinf() scan |