Notes¶
Demo overview
example_mlp now has two parts: the main Iris MLP flow (training + INT8 PTQ demo) and a new BatchNorm1D demo that explicitly shows training/inference mode switching.
DATA SOURCE¶
Data comes from example/data/iris_data.hpp:
namespace tiny_data
{
constexpr int IRIS_N_SAMPLES = 150;
constexpr int IRIS_N_FEATURES = 4;
constexpr int IRIS_N_CLASSES = 3;
extern const float IRIS_X[IRIS_N_SAMPLES][IRIS_N_FEATURES];
extern const int IRIS_Y[IRIS_N_SAMPLES];
}
IRIS_X and IRIS_Y are embedded as static const arrays in the firmware — no file I/O required.
MODEL¶
Expands to:
TRAINING CONFIG¶
- Split:
dataset.split(0.2f, train_ds, test_ds, 42)→ 120 train / 30 test. - Optimiser:
Adam(lr=1e-3, β1=0.9, β2=0.999). - Loss:
LossType::CROSS_ENTROPY(raw logits in; softmax baked into the loss). - Hyper:
epochs=100, batch_size=16, print_every=20.
TRAINING FLOW¶
Trainer trainer(&model, &opt, LossType::CROSS_ENTROPY);
Trainer::Config cfg;
cfg.epochs = 100;
cfg.batch_size = 16;
cfg.print_every = 20;
trainer.fit(train_ds, cfg, &test_ds);
float train_acc = trainer.evaluate_accuracy(train_ds);
float test_acc = trainer.evaluate_accuracy(test_ds);
Trainer::fit will:
- On first call, run
model.collect_params(params)andopt.init(params). - Shuffle the training set every epoch via
train_ds.shuffle(epoch + 1). - Loop with
batch_size = 16callingnext_batch, then forward / lossforward / lossbackward /model.backward/opt.step. - Print
loss + val_acceveryprint_everyepochs.
INT8 PTQ DEMO¶
QuantParams wp = calibrate(demo_w, TINY_DTYPE_INT8);
int8_t *w_int8 = (int8_t *)TINY_AI_MALLOC(demo_w.size);
tiny_quant_params_t cp = wp.to_c();
tiny_quant_f32_to_int8(demo_w.data, w_int8, demo_w.size, &cp);
The demo prints scale / zero_point / quantised value / dequantised value, so you can see the symmetric quantisation precision visually. For full integer inference, see tiny_quant_dense_forward_int8 in QUANT/INT/notes.
New: BatchNorm1D demo¶
After the main pipeline, example_mlp calls batchnorm1d_demo(train_ds, test_ds) (only when TINY_AI_TRAINING_ENABLED is enabled):
Key behaviors:
model.set_training_mode(true): BN uses current-batch stats and updatesrunning_mean/running_var.model.set_training_mode(false): BN switches to running stats for inference.- The demo prints the first BN layer's leading
running_mean/running_varvalues for quick sanity checks.
Measured output (2026-04-30)¶
========================================
tiny_ai | MLP Example (Iris)
========================================
Dataset split: 120 train / 30 test
Sequential model (6 layers)
--------------------
[ 0] dense
[ 1] activation
[ 2] dense
[ 3] activation
[ 4] dense
[ 5] activation
--------------------
Training...
Epoch [ 20/100] loss: 0.820218 val_acc: 0.7667
Epoch [ 40/100] loss: 0.690300 val_acc: 0.9333
Epoch [ 60/100] loss: 0.629859 val_acc: 0.9000
Epoch [ 80/100] loss: 0.602956 val_acc: 0.9000
Epoch [100/100] loss: 0.586117 val_acc: 0.9000
--- Float32 Results ---
Train accuracy: 97.50%
Test accuracy: 90.00%
--- INT8 PTQ Inference ---
Quantisation demo: calibrating weight tensor...
Weight scale = 0.006299 zero_point = 0
Original w[0]=-0.8000 Quantised=-127 Dequantised=-0.8000
INT8 accuracy (whole dataset): 8.67%
[BN1D] training_mode = ON
[train mode] Train: 97.50% Test: 80.00%
[BN1D] training_mode = OFF (running stats active)
[infer mode] Train: 99.17% Test: 96.67%
BN1D[0] running_mean (first 4): -0.0179 -0.0019 0.0345 0.0276
BN1D[0] running_var (first 4): 0.6317 0.1680 0.9003 0.7524
example_mlp DONE
Interpretation¶
- FP32 main pipeline converges normally:
Train 97.50% / Test 90.00%, a reasonable Iris result. - INT8=8.67% is not the real quantized accuracy: current
run_int8_inference()predicts onX_testbut compares against full-dataset labelsIRIS_Y; this mismatched pairing invalidates the reported metric. - BN1D inference mode outperforms train-mode eval: test accuracy improves from
80.00%to96.67%when switching to running stats, which matches expected BatchNorm behavior. - Running stats are non-trivial: printed
running_mean/running_varconfirm BN stats are actually being accumulated.
ENTRY POINT¶
tiny_ai.h exposes void example_mlp(void) for both C and C++ callers: