Vector — Implementation¶

File Structure¶

vec/
├── tiny_vec.h        (51 lines — public API)
└── tiny_vec.c        (723 lines — implementation)

Dependencies: tiny_math_config.h, ESP-DSP headers (dsps_math.h, dsps_dotprod.h) on ESP32.

Design Pattern¶

Every function follows the same three-step pattern:

Validate inputs (null pointers, lengths)
Dispatch to ESP-DSP when contiguous (stride = 1) on ESP32 platform
Fallback to generic C loop with stride-based indexing

Example: `tiny_vec_add_f32`¶

tiny_error_t tiny_vec_add_f32(const float *input1, const float *input2,
                               float *output, int len,
                               int stride1, int stride2, int stride_out)
{
    // 1. Validate
    if (!input1 || !input2 || !output || len <= 0)
        return TINY_ERR_INVALID_ARG;

    // 2. ESP-DSP fast path (contiguous only)
#if MCU_PLATFORM_SELECTED == MCU_PLATFORM_ESP32
    if (stride1 == 1 && stride2 == 1 && stride_out == 1) {
        dsps_add_f32(input1, input2, output, len);
        return TINY_OK;
    }
#endif

    // 3. Generic fallback with stride
    for (int i = 0; i < len; i++)
        output[i * stride_out] = input1[i * stride1] + input2[i * stride2];

    return TINY_OK;
}

Strided Access Model¶

TinyVec supports strided access for all element-wise operations:

Stride = 1 (contiguous):          Stride = 3 (take every 3rd element):
┌───┬───┬───┬───┬───┬───┐        ┌───┬───┬───┬───┬───┬───┐
│ a │ b │ c │ d │ e │ f │        │ a │   │   │ b │   │   │
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │        │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │
└───┴───┴───┴───┴───┴───┘        └───┴───┴───┴───┴───┴───┘

Element \(i\) is accessed at base[i * stride]. Useful for extracting sub-vectors from matrix rows or interleaved data.

Performance

ESP-DSP fast path only activates for stride = 1. Any non-unit stride forces the generic C loop (\(\sim\) 10× slower).

Platform Dispatch Summary¶

Function Group	ESP-DSP Call	Generic Fallback
`add`, `sub`, `mul`	`dsps_add/sub/mul_f32`	Strided C loop
`addc`, `subc`, `mulc`	`dsps_add/sub/mulc_f32`	C loop with scalar
`div`, `divc`	—	C loop with div-by-zero guard
`sqrt`, `inv_sqrt`	`dsps_sqrt_f32`, `dsps_sqrtf32_f32`	`sqrtf()` C loop
`sqrtf`, `inv_sqrtf`	`dsps_sqrtf32_f32`	`sqrtf()` fast C loop
`dotprod`	`dsps_dotprod_f32`	Sum-of-products C loop
`dotprode`	`dsps_dotprod_f32`	Sum-of-products with stride

Division Safety¶

tiny_vec_div_f32 and tiny_vec_divc_f32 have a zero-division guard:

if (fabsf(denominator) < TINY_MATH_MIN_DENOMINATOR) {
    if (!allow_divide_by_zero) {
        return TINY_ERR_MATH_ZERO_DIVISION;
    }
    output = (numerator >= 0) ? INFINITY : -INFINITY;
}

allow_divide_by_zero = false → returns error code (safe default)
allow_divide_by_zero = true → returns ±inf (useful for data processing pipelines where occasional zeros are expected)

Fast vs Standard Sqrt¶

TinyVec provides two variants of sqrt and inv_sqrt:

Variant	ESP-DSP	Precision	Speed
`sqrt_f32`	`dsps_sqrt_f32`	Full (`~1e-7`)	1×
`sqrtf_f32`	`dsps_sqrtf32_f32`	Approx (`~1e-4`)	2–3× faster
`inv_sqrt_f32`	`dsps_sqrt_f32` + division	Full	1×
`inv_sqrtf_f32`	`dsps_sqrtf32_f32` + division	Approx	2–3× faster

Use sqrtf / inv_sqrtf in inner loops of iterative algorithms where absolute precision is not required.