Notes¶
Notes
tiny_layer defines the abstract base class Layer shared by every neural-network layer, plus three parameter-free utility layers: ActivationLayer, Flatten, and GlobalAvgPool. These utilities expose the same Layer interface so they can be stacked directly inside a Sequential model.
Layer — The Foundation of Every Neural Network Layer
`tiny::Layer` is the abstract base class. Understand it, and you understand how the entire network connects.
Intuition¶
Forward Pass = Data Flowing Through the Network¶
`forward()` takes an input tensor, applies the layer's computation, and outputs the tensor for the next layer.
Think of an assembly line: sensor reading → normalization → feature extraction → classifier → result. Each station is a layer.
Backward Pass (Training)¶
`backward()` receives the loss gradient w.r.t. the output, computes: - gradient w.r.t. the input (to pass to the previous layer) - gradient w.r.t. the layer's parameters (to update weights)
Gradients flow backward through the network
The last layer receives the gradient from the loss function and passes it forward (backward) layer by layer. Each layer first computes param gradients (to update itself), then passes the input gradient to the previous layer.
Built-in Utility Layers¶
| Layer | Function | Shape change |
|---|---|---|
| ActivationLayer | Wraps activation function | No change |
| Flatten | Flattens multidim to 1D | `[B,C,L]→[B, C*L]` |
| GlobalAvgPool | Average over spatial dims | `[B,C,L]→[B,C]` |
Layer ABSTRACT BASE¶
class Layer
{
public:
const char *name; // shown in summary()
bool trainable; // does this layer carry learnable parameters?
explicit Layer(const char *name = "layer", bool trainable = false)
: name(name), trainable(trainable) {}
virtual ~Layer() {}
virtual Tensor forward(const Tensor &x) = 0;
#if TINY_AI_TRAINING_ENABLED
virtual Tensor backward(const Tensor &grad_out) = 0;
virtual void collect_params(std::vector<ParamGroup> &groups) {}
#endif
};
Contract:
forward(x): must return a new tensor.backward(grad_out): must returndL/dx, while accumulating gradients intodweight,dbias, … members.collect_params(): only trainable layers override it to push(param, grad)pairs.trainable: letsSequential::collect_params()skip parameter-free layers (e.g.Flatten).
ActivationLayer¶
Wraps a stateless activation as a Layer so it stacks naturally inside Sequential.
class ActivationLayer : public Layer
{
public:
explicit ActivationLayer(ActType type, float alpha = 0.01f);
Tensor forward (const Tensor &x) override;
Tensor backward(const Tensor &grad_out) override;
};
Implementation notes:
- Cache strategy:
forward()decides what to cache based on the activation type- Sigmoid / Tanh / Softmax → cache the output
y(used directly by backward). - ReLU / LeakyReLU / GELU → cache the input
x.
- Sigmoid / Tanh / Softmax → cache the output
- alpha defaults to 0.01, applies only to LeakyReLU.
Flatten¶
Reshapes [batch, ...] into [batch, flat]:
class Flatten : public Layer
{
Tensor forward (const Tensor &x) override; // [B, ...] → [B, size/B]
Tensor backward(const Tensor &grad_out) override;
};
flat = size / batch, dispatched viareshape_2d.- Backward restores the gradient's original
(in_ndim, in_shape). - No parameters,
trainable = false→ ignored bycollect_params.
Standard CNN1D recipe:
tiny_cnn.cpp plugs Flatten in right after the conv blocks.
GlobalAvgPool¶
Mean over the sequence axis, common for Transformer-style outputs ([B, S, F] → [B, F]):
class GlobalAvgPool : public Layer
{
Tensor forward (const Tensor &x) override; // [B, S, F] → [B, F]
Tensor backward(const Tensor &grad_out) override;
};
- Forward averages over
s. - Backward distributes
grad_outevenly across theSpositions.
The Attention example chains it:
Interaction with Sequential¶
Sequential m;
m.add(new Dense(F, H));
m.add(new ActivationLayer(ActType::RELU));
m.add(new Dense(H, C));
m.add(new ActivationLayer(ActType::SOFTMAX));
Sequentialowns everyLayer*registered viaadd()anddeletes them in its destructor.forward(x)calls eachforwardin order;backward(grad_out)calls eachbackwardin reverse.- Only
trainable == truelayers are visited bySequential::collect_params().
Custom Layer¶
Subclass Layer, implement forward / backward / collect_params. Example:
class Scale : public Layer
{
public:
Tensor scale; // [feat]
#if TINY_AI_TRAINING_ENABLED
Tensor dscale;
#endif
Scale(int feat) : Layer("scale", true), scale(feat)
{
scale.fill(1.0f);
#if TINY_AI_TRAINING_ENABLED
dscale = Tensor::zeros_like(scale);
#endif
}
Tensor forward(const Tensor &x) override { /* x * scale */ return x.clone(); }
#if TINY_AI_TRAINING_ENABLED
Tensor backward(const Tensor &grad_out) override { return grad_out; }
void collect_params(std::vector<ParamGroup> &g) override
{
g.push_back({&scale, &dscale});
}
#endif
};