Notes¶
Notes
tiny_loss ships four common loss functions: Mean Squared Error (MSE), Mean Absolute Error (MAE), Softmax + Cross-Entropy, and Binary Cross-Entropy. Every loss exposes both a scalar forward value and a gradient tensor backward.
Loss — Measuring the Gap Between Prediction and Truth
The loss function tells you how wrong the model currently is. Training = minimizing this value.
Intuition¶
Loss = Penalty¶
Model predicts \(\hat{y}\), ground truth is \(y\). The loss \(L(\hat{y}, y)\) outputs:
- 0 = perfect prediction
- Larger = more wrong
Common Losses¶
| Loss | Formula | Use case | Why this design |
|---|---|---|---|
| MSE | mean of \((\hat{y} - y)^2\) | Regression (numeric prediction) | Squares penalize large errors—2× off = 4× cost |
| MAE | mean of ( | \hat{y} - y | ) |
| CrossEntropy | \(-\sum y \cdot \log(\hat{y})\) | Classification (probabilities) | Worst when model is confidently wrong |
CrossEntropy intuition
True label is cat (`[1,0,0]`). Model predicts `[0.8, 0.1, 0.1]` → small loss. Predicts `[0.3, 0.6, 0.1]` → large loss. Confidently wrong = heaviest penalty.
Gradient = Direction Downhill¶
The loss gradient \(\partial L / \partial \text{params}\) tells the optimizer:
- Direction: which way to adjust parameters to reduce loss
- Magnitude: how sensitive the loss is to each parameter
This is the essence of backpropagation and gradient descent.
LossType ENUM¶
enum class LossType
{
MSE = 0, // Mean Squared Error
MAE, // Mean Absolute Error
CROSS_ENTROPY, // Softmax + Cross-Entropy (input = raw logits)
BINARY_CE // Binary CE (input = sigmoid probabilities)
};
MATH¶
MSE¶
MAE¶
Cross-Entropy (numerically stable, expects logits)¶
cross_entropy_forward consumes raw logits and uses the log-sum-exp trick:
Its gradient is softmax(logits) - one_hot(labels) divided by the batch size:
Label format
cross_entropy_* takes int* labels (length = batch), each entry is a class index in [0, num_classes), not a one-hot tensor.
Binary CE¶
Inputs are sigmoid probabilities pred ∈ (0, 1); targets are 0/1:
For numerical stability, TINY_MATH_MIN_POSITIVE_INPUT_F32 is added inside the log to avoid log(0).
API OVERVIEW¶
float mse_forward (const Tensor &pred, const Tensor &target);
Tensor mse_backward (const Tensor &pred, const Tensor &target);
float mae_forward (const Tensor &pred, const Tensor &target);
Tensor mae_backward (const Tensor &pred, const Tensor &target);
float cross_entropy_forward (const Tensor &logits, const int *labels);
Tensor cross_entropy_backward(const Tensor &logits, const int *labels);
float binary_ce_forward (const Tensor &pred, const Tensor &target);
Tensor binary_ce_backward (const Tensor &pred, const Tensor &target);
Dispatch helpers¶
float loss_forward (const Tensor &pred, const Tensor &target,
LossType type, const int *labels = nullptr);
Tensor loss_backward(const Tensor &pred, const Tensor &target,
LossType type, const int *labels = nullptr);
Trainer plugs the loss in via loss_forward / loss_backward + LossType, so the loss is fully swappable.
RECOMMENDATIONS¶
| Scenario | Loss | Final layer |
|---|---|---|
| Multi-class classification | CROSS_ENTROPY | Dense (raw logits — softmax is built into the loss) |
| Binary classification | BINARY_CE | Dense + Sigmoid |
| Regression | MSE | Dense |
| Robust regression | MAE | Dense |
Softmax + Cross-Entropy
cross_entropy_forward already contains softmax, so the model's last activation can be ActType::LINEAR (or omitted). MLP / CNN1D default to use_softmax = true mostly because predict() / accuracy() need probabilities downstream — feel free to disable it if you don't.