A 57-parameter trained transformer that performs 10-digit addition with 100% accuracy. Built for the AdderBoard challenge.
Interactive model | Paper (not a real scientific paper, its a weekend project)
57 parameters, 100% accuracy (10,010/10,010). Trained from scratch — no warm-starting, no frozen pretrained values. Sinusoidal positional encoding (0 learned position parameters).
| Params | Accuracy | Seed | Key innovations | Training |
|---|---|---|---|---|
| 57 | 100% | 777 | triple-duty head_proj (V/output/fc2 tied) | carry_mix=0.8, step-fade, 60K steps |
| 67 | 100% | 71046 | sinusoidal pos, qk_dim=4 | carry_mix=0.8, step-fade, 60K steps |
| 74 | 100% | 45214, 71046, 78988 | learned spiral, qk_dim=5 | carry_mix=0.8, step-fade, 120K steps |
| 242 | 100% | (baseline) | — | JackCai's architecture |
Single-layer autoregressive decoder. 5-dimensional residual stream (2D token + 3D position). One attention head with 4D Q/K subspace and 5D value pathway.
d_model = 5 = tok_dim(2) + pos_dim(3)
1 layer, 1 head, qk_dim = 4, head_dim = 5
FFN dim = 2 (no bias, GELU, fc2 tied to head_proj)
Every component is compressed:
| Component | Params | Innovation |
|---|---|---|
| Token embeddings | 3 | Circular arc: A·cos(s+d·stride), A·sin(s+d·stride) |
| Positions | 0 | Frozen sinusoidal (amp=3.5, phase=0, slope=0.15) |
| Carry position | 3 | Learned (high norm, far from digits) |
| EQUALS position | 3 | Learned (PLUS/EOS frozen at zero) |
| Q/K projection | 12+1 | Tied Q/K with phase rotation (3→4) |
| Attention output | 10 | Rank-1 factorization: A(5×1) @ B(1×5) |
| FFN fc1 | 10 | 5→2, no bias, GELU |
| Output head | 10 | Triple-duty: V projection + output classifier + FFN expansion (fc2) |
| Normalization | 5 | Single RMSNorm weight shared across all 3 norms |
| Total | 57 |
242p JackCai's split-subspace (starting point)
↓ spiral positions, rank-2 out_proj, linear correction, frozen EOS
203p tied Q/K + phase rotation, adaptive weight decay
↓ tok_dim 3→2, d_model 6→5, vocab 14→10, 1 head
133p parametric embeddings, rank-1 out_proj, no FFN bias
↓ shared norms, tied V/output, no position correction
75p frozen PLUS/EOS positions
↓ tied A=B (circular embedding) + high carry-mix training
74p previous from-scratch frontier
↓ sinusoidal positions (freeze spiral) + qk_dim 5→4
67p previous from-scratch frontier
↓ triple-duty head_proj (tie fc2 = head_proj.T)
57p ← current from-scratch frontier
# Verify (100% accuracy, ~80s)
uv run python ../AdderBoard/verify.py submission_57p/submission_57p.py
# Train from scratch (~60K steps, ~5 min)
uv run python -m microadder.train --run-name sub100_my_57p --seed 777 --tie-fc2-head
# Evaluate a checkpoint
uv run python -m microadder.eval --checkpoint results/runs/sub100_my_57p/checkpoints/best.ptThe training breakthrough that made sub-100p reproducible: 80% carry-heavy examples with step-based linear fade.
Long carry chains (9999999999 + 1) are exponentially rare in uniform sampling but are the hardest test case. Flooding early training with carry-heavy examples forces the carry circuit to form. Step-based fade (linear ramp from 80% to 0% over steps 15K-45K) avoids the oscillation feedback loop that metric-based fade creates at high carry-mix.
microadder/ Clean reimplementation (model + training)
model.py 57p/67p/74p model definition
train.py Training loop
data.py Data generation
eval.py Evaluation
submission_57p/ Current best submission (57p)
submission_67p/ Previous best (67p)
submission_74p/ Older best (74p)
src/ Original research codebase (full feature set)
PAPER.md Full write-up of the research
ARCHITECTURE.md How the model performs addition (weight analysis)
RESEARCH.md Detailed research notes
JOURNEY.md Chronological research narrative
REFLECTION.md Retrospective analysis
Built on the work of:
- JackCai1206 — the 242p split-subspace architecture
- Wonderfall (param_40) — tied Q/K with phase rotation
- Alex Litzenberger (TinyAdder) — 36p hand-coded model, the representational floor
- Dimitris Papailiopoulos — the AdderBoard challenge
Made by Arseniy Zarechnev with help from Claude.
@misc{zarechnev2026microadder,
author = {Arseniy Zarechnev and Claude},
title = {57 Parameters Is All You Need: Training a Transformer for Perfect 10-Digit Addition},
year = {2026},
url = {https://github.com/evindor/MicroAdder},
}