Linalg — Implementation¶

File Structure¶

linalg/
├── tiny_linalg.h        (36 lines — public API declarations)
└── tiny_linalg.c        (516 lines — implementation)

Dependencies: tiny_math_config.h → tiny_constants.h, plus <math.h> and <string.h>.

Design Pattern¶

Every function follows a consistent pattern:

Validate inputs (null pointers, dimensions)
Dispatch to ESP-DSP if conditions are met (padding=0, stride=1, ESP32 platform)
Fallback to generic C loop for all other cases

Example: `tiny_mat_add_f32`¶

tiny_error_t tiny_mat_add_f32(const float *input1, const float *input2,
                               float *output, int rows, int cols,
                               int padd1, int padd2, int padd_out,
                               int stride1, int stride2, int stride_out)
{
    // 1. Validate
    if (NULL == input1 || NULL == input2 || NULL == output)
        return TINY_ERR_MATH_NULL_POINTER;

    // 2. Platform dispatch
    #if MCU_PLATFORM_SELECTED == MCU_PLATFORM_ESP32
    if (padd1 == 0 && padd2 == 0 && padd_out == 0 &&
        stride1 == 1 && stride2 == 1 && stride_out == 1) {
        dspm_add_f32(input1, input2, output, rows, cols,
                     0, 0, 0, 1, 1, 1);
        return TINY_OK;
    }
    #endif

    // 3. Generic fallback — row-major with stride
    const int in1_step = cols + padd1;
    const int in2_step = cols + padd2;
    const int out_step = cols + padd_out;

    for (int row = 0; row < rows; row++) {
        int base_in1 = row * in1_step;
        int base_in2 = row * in2_step;
        int base_out = row * out_step;

        for (int col = 0; col < cols; col++) {
            int idx_in1 = base_in1 + col * stride1;
            int idx_in2 = base_in2 + col * stride2;
            int idx_out = base_out + col * stride_out;

            output[idx_out] = input1[idx_in1] + input2[idx_in2];
        }
    }
    return TINY_OK;
}

Memory Layout Model¶

TinyMath matrices are stored in row-major format with optional padding:

Row-major (no padding):            Row-major (padding=2):
┌───┬───┬───┬───┐                  ┌───┬───┬───┬───┬───┬───┐
│ a │ b │ c │ d │ ← cols=4         │ a │ b │ c │ d │ - │ - │
├───┼───┼───┼───┤                  ├───┼───┼───┼───┼───┼───┤
│ e │ f │ g │ h │                  │ e │ f │ g │ h │ - │ - │
└───┴───┴───┴───┘                  └───┴───┴───┴───┴───┴───┘
cols=4, step=4                     cols=4, step=6

The storage index for element \((i, j)\) is:

index = i × (cols + padding) + j × stride

Why Padding?

Padding exists for compatibility with ESP-DSP and other SIMD-optimised libraries that require power-of-two row alignment for vectorised loads. In most TinySHM workflows (small matrices, n ≤ 50), padding = 0 is optimal.

Platform Dispatch Mechanism¶

#if MCU_PLATFORM_SELECTED == MCU_PLATFORM_ESP32
    // ── Accelerated path ──
    // Requires: contiguous storage (padding=0, stride=1)
    dspm_add_f32(...);       // ESP-DSP, 2-5× faster
#else
    // ── Generic C fallback ──
    for (...) output[i] = a[i] + b[i];  // portable, slow
#endif

The same pattern is used in sub, mult, addc, subc, and multc. The accelerated path is only taken when all matrices are dense (no padding, unit stride).

Function Groups by Implementation Strategy¶

Group	Validation	ESP-DSP Path	Generic Algorithm
add / sub / mult	null ptr, dims	`dspm_add/sub/mul_f32`	Row-major loop with stride
addc / subc / multc	null ptr, dims	`dsps_add/sub/mulc_f32`	Loop with scalar operand
mult	null ptr, dims	`dspm_mult_f32`	Triple-nested loop \(O(mnk)\)
mult_ex	null ptr, dims	—	Triple-nested with padding offsets
matvec	null ptr, dims	`dsps_dotprod_f32` per row	Dot product loop
transpose	null ptr, dims	—	Cache-friendly row iteration
eye / zero / fill	null ptr, dims	`memset`	Simple C loop
hasnan / hasinf	null ptr	—	`isnan()` / `isinf()` scan