Notes¶
Notes
tiny_layer defines the abstract base class Layer shared by every neural-network layer, plus three parameter-free utility layers: ActivationLayer, Flatten, and GlobalAvgPool. These utilities expose the same Layer interface so they can be stacked directly inside a Sequential model.
Layer ABSTRACT BASE¶
class Layer
{
public:
const char *name; // shown in summary()
bool trainable; // does this layer carry learnable parameters?
explicit Layer(const char *name = "layer", bool trainable = false)
: name(name), trainable(trainable) {}
virtual ~Layer() {}
virtual Tensor forward(const Tensor &x) = 0;
#if TINY_AI_TRAINING_ENABLED
virtual Tensor backward(const Tensor &grad_out) = 0;
virtual void collect_params(std::vector<ParamGroup> &groups) {}
#endif
};
Contract:
forward(x): must return a new tensor.backward(grad_out): must returndL/dx, while accumulating gradients intodweight,dbias, … members.collect_params(): only trainable layers override it to push(param, grad)pairs.trainable: letsSequential::collect_params()skip parameter-free layers (e.g.Flatten).
ActivationLayer¶
Wraps a stateless activation as a Layer so it stacks naturally inside Sequential.
class ActivationLayer : public Layer
{
public:
explicit ActivationLayer(ActType type, float alpha = 0.01f);
Tensor forward (const Tensor &x) override;
Tensor backward(const Tensor &grad_out) override;
};
Implementation notes:
- Cache strategy:
forward()decides what to cache based on the activation type- Sigmoid / Tanh / Softmax → cache the output
y(used directly by backward). - ReLU / LeakyReLU / GELU → cache the input
x.
- Sigmoid / Tanh / Softmax → cache the output
- alpha defaults to 0.01, applies only to LeakyReLU.
Flatten¶
Reshapes [batch, ...] into [batch, flat]:
class Flatten : public Layer
{
Tensor forward (const Tensor &x) override; // [B, ...] → [B, size/B]
Tensor backward(const Tensor &grad_out) override;
};
flat = size / batch, dispatched viareshape_2d.- Backward restores the gradient's original
(in_ndim, in_shape). - No parameters,
trainable = false→ ignored bycollect_params.
Standard CNN1D recipe:
tiny_cnn.cpp plugs Flatten in right after the conv blocks.
GlobalAvgPool¶
Mean over the sequence axis, common for Transformer-style outputs ([B, S, F] → [B, F]):
class GlobalAvgPool : public Layer
{
Tensor forward (const Tensor &x) override; // [B, S, F] → [B, F]
Tensor backward(const Tensor &grad_out) override;
};
- Forward averages over
s. - Backward distributes
grad_outevenly across theSpositions.
The Attention example chains it:
Interaction with Sequential¶
Sequential m;
m.add(new Dense(F, H));
m.add(new ActivationLayer(ActType::RELU));
m.add(new Dense(H, C));
m.add(new ActivationLayer(ActType::SOFTMAX));
Sequentialowns everyLayer*registered viaadd()anddeletes them in its destructor.forward(x)calls eachforwardin order;backward(grad_out)calls eachbackwardin reverse.- Only
trainable == truelayers are visited bySequential::collect_params().
Custom Layer¶
Subclass Layer, implement forward / backward / collect_params. Example:
class Scale : public Layer
{
public:
Tensor scale; // [feat]
#if TINY_AI_TRAINING_ENABLED
Tensor dscale;
#endif
Scale(int feat) : Layer("scale", true), scale(feat)
{
scale.fill(1.0f);
#if TINY_AI_TRAINING_ENABLED
dscale = Tensor::zeros_like(scale);
#endif
}
Tensor forward(const Tensor &x) override { /* x * scale */ return x.clone(); }
#if TINY_AI_TRAINING_ENABLED
Tensor backward(const Tensor &grad_out) override { return grad_out; }
void collect_params(std::vector<ParamGroup> &g) override
{
g.push_back({&scale, &dscale});
}
#endif
};