Training Status
Live updates on TinyMemoryLM training progress. Updated whenever I remember to check.
Model Training
Training~1M
Parameters
2K
Context
6
Layers
4
Heads
160
Dimension
256
FFN Dim
Architecture Features
Recurrent Memory (Chunk-GRU)
Enabled
Precision Codebook (Output Bias)
Enabled (2111 params)
Makeshift MTP
Enabled (horizons: 2,3,4, weight: 0.3)
Gradient Checkpointing
Disabled
Torch Compile
Disabled
Chunked Attention
Enabled
Flash Attention
Enabled
Repetition Penalty
Disabled (1.0)
Tied Embeddings
Enabled
Output Logit Bias
Enabled
Word Token Loss Boost
Enabled (3x)
Response-Start Boost
Enabled (3x, 20 tokens)
Entropy Regularization
Disabled
QK-Norm (RMSNorm)
Enabled
SwiGLU FFN
Enabled
Recurrent Memory (Chunk-GRU)
8
Chunk Size
32
Memory Dim
GRU
Cell Type
4
Layers
Training Datasets
HuggingFaceFW/fineweb_100BT
Pretraining
mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
Instruction Tuning
tatsu-lab/alpaca
Instruction Tuning
databricks/databricks-dolly-15k
Instruction Tuning
TeichAI/Step-3.5-Flash-2600x
Generalization
TeichAI/convo-v1
Generalization (2x)
Training Log
2026-03-01
Running
2026-02-28
Done
2026-02-27
Done
2026-02-26
Done
Want to follow along with the training adventures?
Read the Blog