Notes¶

Notes

tiny_layer defines the abstract base class Layer shared by every neural-network layer, plus three parameter-free utility layers: ActivationLayer, Flatten, and GlobalAvgPool. These utilities expose the same Layer interface so they can be stacked directly inside a Sequential model.

Layer — The Foundation of Every Neural Network Layer

`tiny::Layer` is the abstract base class. Understand it, and you understand how the entire network connects.

Intuition¶

Forward Pass = Data Flowing Through the Network¶

`forward()` takes an input tensor, applies the layer's computation, and outputs the tensor for the next layer.

Think of an assembly line: sensor reading → normalization → feature extraction → classifier → result. Each station is a layer.

Backward Pass (Training)¶

`backward()` receives the loss gradient w.r.t. the output, computes: - gradient w.r.t. the input (to pass to the previous layer) - gradient w.r.t. the layer's parameters (to update weights)

Gradients flow backward through the network

The last layer receives the gradient from the loss function and passes it forward (backward) layer by layer. Each layer first computes param gradients (to update itself), then passes the input gradient to the previous layer.

Built-in Utility Layers¶

Layer	Function	Shape change
ActivationLayer	Wraps activation function	No change
Flatten	Flattens multidim to 1D	`[B,C,L]→[B, C*L]`
GlobalAvgPool	Average over spatial dims	`[B,C,L]→[B,C]`

Layer ABSTRACT BASE¶

class Layer
{
public:
    const char *name;        // shown in summary()
    bool        trainable;   // does this layer carry learnable parameters?

    explicit Layer(const char *name = "layer", bool trainable = false)
        : name(name), trainable(trainable) {}

    virtual ~Layer() {}

    virtual Tensor forward(const Tensor &x) = 0;

#if TINY_AI_TRAINING_ENABLED
    virtual Tensor backward(const Tensor &grad_out) = 0;
    virtual void   collect_params(std::vector<ParamGroup> &groups) {}
#endif
};

Contract:

forward(x): must return a new tensor.
backward(grad_out): must return dL/dx, while accumulating gradients into dweight, dbias, … members.
collect_params(): only trainable layers override it to push (param, grad) pairs.
trainable: lets Sequential::collect_params() skip parameter-free layers (e.g. Flatten).

ActivationLayer¶

Wraps a stateless activation as a Layer so it stacks naturally inside Sequential.

class ActivationLayer : public Layer
{
public:
    explicit ActivationLayer(ActType type, float alpha = 0.01f);
    Tensor forward (const Tensor &x) override;
    Tensor backward(const Tensor &grad_out) override;
};

Implementation notes:

Cache strategy: forward() decides what to cache based on the activation type
- Sigmoid / Tanh / Softmax → cache the output y (used directly by backward).
- ReLU / LeakyReLU / GELU → cache the input x.
alpha defaults to 0.01, applies only to LeakyReLU.

Flatten¶

Reshapes [batch, ...] into [batch, flat]:

class Flatten : public Layer
{
    Tensor forward (const Tensor &x) override;   // [B, ...] → [B, size/B]
    Tensor backward(const Tensor &grad_out) override;
};

flat = size / batch, dispatched via reshape_2d.
Backward restores the gradient's original (in_ndim, in_shape).
No parameters, trainable = false → ignored by collect_params.

Standard CNN1D recipe:

Conv1D + ReLU + MaxPool1D × N → Flatten → Dense → Softmax

tiny_cnn.cpp plugs Flatten in right after the conv blocks.

GlobalAvgPool¶

Mean over the sequence axis, common for Transformer-style outputs ([B, S, F] → [B, F]):

class GlobalAvgPool : public Layer
{
    Tensor forward (const Tensor &x) override;   // [B, S, F] → [B, F]
    Tensor backward(const Tensor &grad_out) override;
};

Forward averages over s.
Backward distributes grad_out evenly across the S positions.

The Attention example chains it:

Attention([B, S, E]) → GlobalAvgPool([B, E]) → Dense([B, num_classes])

Interaction with Sequential¶

Sequential m;
m.add(new Dense(F, H));
m.add(new ActivationLayer(ActType::RELU));
m.add(new Dense(H, C));
m.add(new ActivationLayer(ActType::SOFTMAX));

Sequential owns every Layer* registered via add() and deletes them in its destructor.
forward(x) calls each forward in order; backward(grad_out) calls each backward in reverse.
Only trainable == true layers are visited by Sequential::collect_params().

Custom Layer¶

Subclass Layer, implement forward / backward / collect_params. Example:

class Scale : public Layer
{
public:
    Tensor scale;   // [feat]
#if TINY_AI_TRAINING_ENABLED
    Tensor dscale;
#endif

    Scale(int feat) : Layer("scale", true), scale(feat)
    {
        scale.fill(1.0f);
#if TINY_AI_TRAINING_ENABLED
        dscale = Tensor::zeros_like(scale);
#endif
    }

    Tensor forward(const Tensor &x) override { /* x * scale */ return x.clone(); }

#if TINY_AI_TRAINING_ENABLED
    Tensor backward(const Tensor &grad_out) override { return grad_out; }
    void collect_params(std::vector<ParamGroup> &g) override
    {
        g.push_back({&scale, &dscale});
    }
#endif
};