Notes¶
Notes
MLP is a convenience wrapper around Sequential for Multi-Layer Perceptrons. Pass a list of dimensions {in, h1, h2, ..., out}; it auto-inserts Dense + activation layers and optionally a final Softmax.
MLP — Multi-Layer Perceptron: Universal Function Approximator
MLP = stacked Dense + Activation layers. With enough neurons, can approximate any function.
Intuition¶
Universal Function Approximator¶
Structure: `Input → [Dense → ReLU] × N → Dense (output)`
vs Hand-Crafted Features¶
MLP automatically learns what features matter, instead of requiring manual feature engineering.
IRIS Example¶
``` Dense(4→64) → ReLU → Dense(64→3) → Softmax ``` Input: 4 flower dimensions → Output: 3 Iris species.
CLASS DEFINITION¶
class MLP : public Sequential
{
public:
MLP(std::initializer_list<int> dims,
ActType hidden_act = ActType::RELU,
bool use_softmax = true,
bool use_bias = true);
int in_features() const;
int out_features() const;
};
CONSTRUCTION LOGIC¶
For dims = {d0, d1, ..., d_{N-1}}, loop i = 0..N-2:
add(new Dense(d_i, d_{i+1}, use_bias)).- If
i < N-2:add(new ActivationLayer(hidden_act))(hidden activation). - If
i == N-2anduse_softmax == true:add(new ActivationLayer(SOFTMAX))(output softmax).
EXAMPLE¶
// Iris: 4 → 16 → 8 → 3
MLP model({4, 16, 8, 3}, ActType::RELU, /*use_softmax=*/true);
model.summary();
/* Sequential model (6 layers)
--------------------
[ 0] dense
[ 1] activation // ReLU
[ 2] dense
[ 3] activation // ReLU
[ 4] dense
[ 5] activation // SOFTMAX
--------------------
*/
Adam opt(1e-3f);
Trainer trainer(&model, &opt, LossType::CROSS_ENTROPY);
trainer.fit(train_data, cfg);
SHOULD I DISABLE SOFTMAX?¶
cross_entropy_forward already contains softmax, so adding or removing the final softmax does not change the loss value. However:
- Need probabilities from
model.predict()/accuracy()→ keepuse_softmax = true. - Need raw logits (for temperature scaling, distillation, …) →
use_softmax = false.
PARAMETER COUNT¶
For dims = {F, h1, h2, ..., C}:
E.g. {4, 16, 8, 3} → 4*16 + 16 + 16*8 + 8 + 8*3 + 3 = 251 floats ≈ 1 KB. Even with training (extra m/v buffers) the budget stays below ~3 KB — easily fits in ESP32-S3 internal SRAM.
RELATION TO Sequential¶
MLP only overrides the constructor; forward / backward / summary / predict / accuracy come straight from Sequential. For richer topologies (residuals, branches, skip connections), use Sequential directly or subclass it.