Skip to content

Vector — Implementation

File Structure

vec/
├── tiny_vec.h        (51 lines — public API)
└── tiny_vec.c        (723 lines — implementation)

Dependencies: tiny_math_config.h, ESP-DSP headers (dsps_math.h, dsps_dotprod.h) on ESP32.


Design Pattern

Every function follows the same three-step pattern:

  1. Validate inputs (null pointers, lengths)
  2. Dispatch to ESP-DSP when contiguous (stride = 1) on ESP32 platform
  3. Fallback to generic C loop with stride-based indexing

Example: tiny_vec_add_f32

tiny_error_t tiny_vec_add_f32(const float *input1, const float *input2,
                               float *output, int len,
                               int stride1, int stride2, int stride_out)
{
    // 1. Validate
    if (!input1 || !input2 || !output || len <= 0)
        return TINY_ERR_INVALID_ARG;

    // 2. ESP-DSP fast path (contiguous only)
#if MCU_PLATFORM_SELECTED == MCU_PLATFORM_ESP32
    if (stride1 == 1 && stride2 == 1 && stride_out == 1) {
        dsps_add_f32(input1, input2, output, len);
        return TINY_OK;
    }
#endif

    // 3. Generic fallback with stride
    for (int i = 0; i < len; i++)
        output[i * stride_out] = input1[i * stride1] + input2[i * stride2];

    return TINY_OK;
}

Strided Access Model

TinyVec supports strided access for all element-wise operations:

Stride = 1 (contiguous):          Stride = 3 (take every 3rd element):
┌───┬───┬───┬───┬───┬───┐        ┌───┬───┬───┬───┬───┬───┐
│ a │ b │ c │ d │ e │ f │        │ a │   │   │ b │   │   │
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │        │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │
└───┴───┴───┴───┴───┴───┘        └───┴───┴───┴───┴───┴───┘

Element \(i\) is accessed at base[i * stride]. Useful for extracting sub-vectors from matrix rows or interleaved data.

Performance

ESP-DSP fast path only activates for stride = 1. Any non-unit stride forces the generic C loop (\(\sim\) 10× slower).


Platform Dispatch Summary

Function Group ESP-DSP Call Generic Fallback
add, sub, mul dsps_add/sub/mul_f32 Strided C loop
addc, subc, mulc dsps_add/sub/mulc_f32 C loop with scalar
div, divc C loop with div-by-zero guard
sqrt, inv_sqrt dsps_sqrt_f32, dsps_sqrtf32_f32 sqrtf() C loop
sqrtf, inv_sqrtf dsps_sqrtf32_f32 sqrtf() fast C loop
dotprod dsps_dotprod_f32 Sum-of-products C loop
dotprode dsps_dotprod_f32 Sum-of-products with stride

Division Safety

tiny_vec_div_f32 and tiny_vec_divc_f32 have a zero-division guard:

if (fabsf(denominator) < TINY_MATH_MIN_DENOMINATOR) {
    if (!allow_divide_by_zero) {
        return TINY_ERR_MATH_ZERO_DIVISION;
    }
    output = (numerator >= 0) ? INFINITY : -INFINITY;
}
  • allow_divide_by_zero = false → returns error code (safe default)
  • allow_divide_by_zero = true → returns ±inf (useful for data processing pipelines where occasional zeros are expected)

Fast vs Standard Sqrt

TinyVec provides two variants of sqrt and inv_sqrt:

Variant ESP-DSP Precision Speed
sqrt_f32 dsps_sqrt_f32 Full (~1e-7)
sqrtf_f32 dsps_sqrtf32_f32 Approx (~1e-4) 2–3× faster
inv_sqrt_f32 dsps_sqrt_f32 + division Full
inv_sqrtf_f32 dsps_sqrtf32_f32 + division Approx 2–3× faster

Use sqrtf / inv_sqrtf in inner loops of iterative algorithms where absolute precision is not required.