A ~1M Parameter Model
with 2K Context
TinyMemoryLM is a hybrid word-character transformer trained on RTX 5090. Features recurrent memory, precision codebook output head, and DeepSeek-V3 style MTP. External memory gives it recall abilities. A codebook handles precision. Multi-token prediction improves output quality. It still forgets where it put its keys though.
In Partnership With
Download CompactAI Studio
Run our AI models locally on your machine. Chat with models, browse available models, and download them for offline use.
Built with Electron.
Architecture Features
A fresh take on the transformer architecture.
Recurrent Memory (Chunk-GRU)
A recurrent memory module with chunk-level GRU processing is integrated into the architecture. Processes sequential chunks to maintain memory across the context window, giving the model external memory capabilities beyond what attention can handle.
Precision Codebook Output Head
Tied weight embeddings with a learnable per-token output bias. Instead of a separate codebook projection, the model ties input embeddings to output weights and learns a 2111-parameter bias vector to compensate for word-token suppression. Simple, parameter-efficient, and surprisingly effective.
Makeshift MTP
DeepSeek-V3 style Multi-Token Prediction with horizons (2, 3, 4). MTP adapters learn to predict multiple future tokens simultaneously, improving sample quality through branch selection during generation. Pretrain weight: 0.3, SFT weight: 0.3.
RTX 5090 Optimized
Tuned for RTX 5090 with flash attention, bf16 mixed precision, and batch size 64. Uses PyTorch Inductor with coordinate_descent_tuning enabled. Gradient checkpointing and torch.compile are available but disabled for Haiku tier. Stability takes priority over speed.
Hybrid Word-Character Tokenizer
Word-level tokenizer with ~2111 tokens. Scans datasets for top 2000 frequent words to achieve 3-4x compression vs pure character-level. Supports special format tokens for instruction tuning: <|user|>, <|assistant|>, <|system|>, <|begin_of_thought|>, <|end_of_thought|>.
The Architecture
Simple on paper, complicated in practice. Just like your relationship with your co-workers.
Model Series
Three tiers following Chinchilla scaling. We borrowed the naming scheme and we are quite pleased with ourselves.
Lightweight and experimental. Updated frequently. The scrappy underdog.
Balanced and stable. Updated less often. The responsible middle child.
Maximum quality. Heavy and most stable. The overachiever who never sleeps.
Sample Output
What happens when you train on English text and hope for the best.
gradient descends like rain
loss slowly fades
AIFinder
A tool that snitches on AI models. Every AI has a writing accent. AIFinder detects it.
Which AI Wrote This?
Paste any AI-generated text and AIFinder will guess which lab made it. Google, Anthropic, OpenAI, DeepSeek, xAI, and more. It learns from corrections. The more you use it, the smarter it gets.
Free API available | 60 requests/min | No API key required
YES WE KNOW IT SUCKS
The tool guesses wrong sometimes. It confuses Anthropic with OpenAI. It confidently identifies Google as DeepSeek. It is basically a parrot with an opinion.
Pro tip: Ask it math and reasoning questions. We trained it on huge amounts of TeichAI datasets (check them out at huggingface.co/TeichAI). It is noticeably better at detecting which math-happy lab produced the output.
That said, I have an AI working on fixing it. I could not be bothered to do it manually.
7+ hours
The AI is trying its best. Poor thing.

