Notes¶
Notes
tiny_conv provides 1-D and 2-D convolutional layers. Conv1D is meant for time-series / signal data; Conv2D for images / feature maps. Both use He (Kaiming) normal initialisation (matched to ReLU-like activations) and cache the padded input for backward.
Conv — Scanning the Signal with a Template
One convolution kernel = one feature detector, sliding across the signal/image.
Intuition¶
Conv1D = Sliding Template Matching¶
Imagine you have a template (e.g., an ECG QRS complex). You slide it across the signal, computing a dot product at each position. High dot product → template matches here.
Weight sharing: the same kernel scans the entire signal. No matter where a pattern appears, it's detected the same way.
Key Parameters¶
| Parameter | Meaning | Intuition |
|---|---|---|
| `in_channels` | Input signal channels | 3-ch accelerometer = 3 |
| `out_channels` | Features to learn | 32 = 32 pattern detectors |
| `kernel_size` | Template width | 9 = looks at 9 points at once |
| `stride` | Step size | 2 = every other point |
| `padding` | Zero padding at edges | 1 = keep output length same |
Conv1D is natural for SHM
Acceleration signals are 1-D time series. Multiple kernels learn different frequency bands/modes.
Shape Change¶
``` Input [B, inch, L] → Output [B, outch, (L + 2*pad - kernel)/stride + 1] ```
SHAPE CONVENTIONS¶
| Layer | Input | Output |
|---|---|---|
| Conv1D | [batch, in_channels, length] | [batch, out_channels, out_length] |
| Conv2D | [batch, in_channels, height, width] | [batch, out_channels, out_height, out_width] |
Output length:
In 2-D the same formula applies independently to H and W.
Conv1D¶
Class definition¶
class Conv1D : public Layer
{
public:
Tensor weight; // [out_ch, in_ch, kernel]
Tensor bias; // [out_ch]
#if TINY_AI_TRAINING_ENABLED
Tensor dweight;
Tensor dbias;
#endif
Conv1D(int in_channels, int out_channels, int kernel_size,
int stride = 1, int padding = 0, bool use_bias = true);
Tensor forward (const Tensor &x) override;
Tensor backward(const Tensor &grad_out) override;
void collect_params(std::vector<ParamGroup> &groups) override;
};
Forward¶
The implementation builds xp (zero-padded on both ends) and caches it in x_cache_ for backward.
Backward¶
dweight[oc, ic, k] += Σ_{b,t} grad_out[b, oc, t] · x_pad[b, ic, t·s + k]dbias[oc] += Σ_{b,t} grad_out[b, oc, t]g_xp[b, ic, t·s + k] += Σ_{oc} grad_out[b, oc, t] · W[oc, ic, k], then strip the padding to getg_x.
He init¶
Box–Muller transforms uniform samples into normal noise; bias is zero-initialised.
Conv2D¶
Class definition¶
class Conv2D : public Layer
{
public:
Tensor weight; // [out_ch, in_ch, kH, kW]
Tensor bias; // [out_ch]
#if TINY_AI_TRAINING_ENABLED
Tensor dweight;
Tensor dbias;
#endif
Conv2D(int in_channels, int out_channels, int kH, int kW,
int sH = 1, int sW = 1, int pH = 0, int pW = 0,
bool use_bias = true);
Tensor forward (const Tensor &x) override;
Tensor backward(const Tensor &grad_out) override;
void collect_params(std::vector<ParamGroup> &groups) override;
};
Parameter naming: kH/kW kernel height/width, sH/sW strides, pH/pW zero padding.
Forward¶
Backward¶
Mirrors Conv1D, looped over (oh, ow). He init variance is 2 / (in_ch · kH · kW).
MEMORY & PERFORMANCE¶
- Param count:
out_ch · in_ch · kH · kW (+ out_ch bias). - Complexity:
O(B · out_ch · OH · OW · in_ch · kH · kW). - Runtime memory: training requires
x_cache_(padded input) plusdweight&dbias. - PSRAM: when
out_ch · in_ch · K^d ≥ 32 KiB, placeweight/dweightin PSRAM.
EXAMPLES¶
1-D signal classification¶
Sequential m;
m.add(new Conv1D(1, 8, 5)); // [B,1,L] → [B,8,L-4]
m.add(new ActivationLayer(ActType::RELU));
m.add(new MaxPool1D(2)); // [B,8,(L-4)/2]
m.add(new Conv1D(8, 16, 3)); // [B,16,...]
m.add(new ActivationLayer(ActType::RELU));
m.add(new MaxPool1D(2));
m.add(new Flatten());
m.add(new Dense(flat_dim, 32));
m.add(new ActivationLayer(ActType::RELU));
m.add(new Dense(32, num_classes));
m.add(new ActivationLayer(ActType::SOFTMAX));
Or use the CNN1D wrapper (see MODELS/CNN).
Image classification¶
LIMITATIONS¶
- No dilation / groups: classic dense convolution only. For depthwise / grouped variants, subclass
Layer. - Padding: zero-padding only, symmetric on H and W.
- stride > kernel: technically allowed but skips parts of the input; the typical case is
stride ≤ kernel. - Training memory:
x_cache_stores the entire padded input — keepB · in_ch · (H+2pH) · (W+2pW)within budget on ESP32-S3.