Skip to content

Linalg — Implementation

File Structure

linalg/
├── tiny_linalg.h        (36 lines — public API declarations)
└── tiny_linalg.c        (516 lines — implementation)

Dependencies: tiny_math_config.htiny_constants.h, plus <math.h> and <string.h>.


Design Pattern

Every function follows a consistent pattern:

  1. Validate inputs (null pointers, dimensions)
  2. Dispatch to ESP-DSP if conditions are met (padding=0, stride=1, ESP32 platform)
  3. Fallback to generic C loop for all other cases

Example: tiny_mat_add_f32

tiny_error_t tiny_mat_add_f32(const float *input1, const float *input2,
                               float *output, int rows, int cols,
                               int padd1, int padd2, int padd_out,
                               int stride1, int stride2, int stride_out)
{
    // 1. Validate
    if (NULL == input1 || NULL == input2 || NULL == output)
        return TINY_ERR_MATH_NULL_POINTER;

    // 2. Platform dispatch
    #if MCU_PLATFORM_SELECTED == MCU_PLATFORM_ESP32
    if (padd1 == 0 && padd2 == 0 && padd_out == 0 &&
        stride1 == 1 && stride2 == 1 && stride_out == 1) {
        dspm_add_f32(input1, input2, output, rows, cols,
                     0, 0, 0, 1, 1, 1);
        return TINY_OK;
    }
    #endif

    // 3. Generic fallback — row-major with stride
    const int in1_step = cols + padd1;
    const int in2_step = cols + padd2;
    const int out_step = cols + padd_out;

    for (int row = 0; row < rows; row++) {
        int base_in1 = row * in1_step;
        int base_in2 = row * in2_step;
        int base_out = row * out_step;

        for (int col = 0; col < cols; col++) {
            int idx_in1 = base_in1 + col * stride1;
            int idx_in2 = base_in2 + col * stride2;
            int idx_out = base_out + col * stride_out;

            output[idx_out] = input1[idx_in1] + input2[idx_in2];
        }
    }
    return TINY_OK;
}

Memory Layout Model

TinyMath matrices are stored in row-major format with optional padding:

Row-major (no padding):            Row-major (padding=2):
┌───┬───┬───┬───┐                  ┌───┬───┬───┬───┬───┬───┐
│ a │ b │ c │ d │ ← cols=4         │ a │ b │ c │ d │ - │ - │
├───┼───┼───┼───┤                  ├───┼───┼───┼───┼───┼───┤
│ e │ f │ g │ h │                  │ e │ f │ g │ h │ - │ - │
└───┴───┴───┴───┘                  └───┴───┴───┴───┴───┴───┘
cols=4, step=4                     cols=4, step=6

The storage index for element \((i, j)\) is:

index = i × (cols + padding) + j × stride

Why Padding?

Padding exists for compatibility with ESP-DSP and other SIMD-optimised libraries that require power-of-two row alignment for vectorised loads. In most TinySHM workflows (small matrices, n ≤ 50), padding = 0 is optimal.


Platform Dispatch Mechanism

#if MCU_PLATFORM_SELECTED == MCU_PLATFORM_ESP32
    // ── Accelerated path ──
    // Requires: contiguous storage (padding=0, stride=1)
    dspm_add_f32(...);       // ESP-DSP, 2-5× faster
#else
    // ── Generic C fallback ──
    for (...) output[i] = a[i] + b[i];  // portable, slow
#endif

The same pattern is used in sub, mult, addc, subc, and multc. The accelerated path is only taken when all matrices are dense (no padding, unit stride).


Function Groups by Implementation Strategy

Group Validation ESP-DSP Path Generic Algorithm
add / sub / mult null ptr, dims dspm_add/sub/mul_f32 Row-major loop with stride
addc / subc / multc null ptr, dims dsps_add/sub/mulc_f32 Loop with scalar operand
mult null ptr, dims dspm_mult_f32 Triple-nested loop \(O(mnk)\)
mult_ex null ptr, dims Triple-nested with padding offsets
matvec null ptr, dims dsps_dotprod_f32 per row Dot product loop
transpose null ptr, dims Cache-friendly row iteration
eye / zero / fill null ptr, dims memset Simple C loop
hasnan / hasinf null ptr isnan() / isinf() scan