Training on RTX 5090

A ~1M Parameter Model
with 2K Context

TinyMemoryLM is a hybrid word-character transformer trained on RTX 5090. Features recurrent memory, precision codebook output head, and DeepSeek-V3 style MTP. External memory gives it recall abilities. A codebook handles precision. Multi-token prediction improves output quality. It still forgets where it put its keys though.

In Partnership With

TRAINING THREE MODEL TIERS

Haiku ~1M params | Live Sonnet ~300M params | In Training Opus ~600M params | In Training

Download CompactAI Studio

Run our AI models locally on your machine. Chat with models, browse available models, and download them for offline use.

Built with Electron.

~1M
Parameters
2K
Context Length
6
Layers
4
Attention Heads
160
Model Dimension
229
FFN Dimension

Architecture Features

A fresh take on the transformer architecture.

M

Recurrent Memory (Chunk-GRU)

A recurrent memory module with chunk-level GRU processing is integrated into the architecture. Processes sequential chunks to maintain memory across the context window, giving the model external memory capabilities beyond what attention can handle.

C

Precision Codebook Output Head

Tied weight embeddings with a learnable per-token output bias. Instead of a separate codebook projection, the model ties input embeddings to output weights and learns a 2111-parameter bias vector to compensate for word-token suppression. Simple, parameter-efficient, and surprisingly effective.

T

Makeshift MTP

DeepSeek-V3 style Multi-Token Prediction with horizons (2, 3, 4). MTP adapters learn to predict multiple future tokens simultaneously, improving sample quality through branch selection during generation. Pretrain weight: 0.3, SFT weight: 0.3.

R

RTX 5090 Optimized

Tuned for RTX 5090 with flash attention, bf16 mixed precision, and batch size 64. Uses PyTorch Inductor with coordinate_descent_tuning enabled. Gradient checkpointing and torch.compile are available but disabled for Haiku tier. Stability takes priority over speed.

H

Hybrid Word-Character Tokenizer

Word-level tokenizer with ~2111 tokens. Scans datasets for top 2000 frequent words to achieve 3-4x compression vs pure character-level. Supports special format tokens for instruction tuning: <|user|>, <|assistant|>, <|system|>, <|begin_of_thought|>, <|end_of_thought|>.

The Architecture

Simple on paper, complicated in practice. Just like your relationship with your co-workers.

Input Character Embedding
Transformer Block x6 RMSNorm, QK-Norm, SwiGLU FFN
MTP Adapters x3 Horizons 2, 3, 4
Tied Output Head Learnable Bias (2111 params)
Output ~2.1K Hybrid Vocab
d_model 160
heads 4
ffn_dim 229
mtp_horizons [2, 3, 4]
vocab_size ~2111
seq_len 2048

Model Series

Three tiers following Chinchilla scaling. We borrowed the naming scheme and we are quite pleased with ourselves.

Haiku ~1M params

Lightweight and experimental. Updated frequently. The scrappy underdog.

dim160 layers6 heads4 ffn_dim229 context2,048 lr8e-4
Sonnet ~300M params

Balanced and stable. Updated less often. The responsible middle child.

dim768 layers36 heads12 ffn_dim2,538 context2,048 lr2e-4
Opus ~600M params

Maximum quality. Heavy and most stable. The overachiever who never sleeps.

dim1,024 layers39 heads16 ffn_dim3,557 context2,048 lr1.6e-4

Sample Output

What happens when you train on English text and hope for the best.

tinyMemoryLM --sample
> Write a haiku about neural networks
< weights dance in dark
gradient descends like rain
loss slowly fades
> What is the meaning of life?
< 42, obviously. Though I suspect the question was rhetorical. Unless you count the time I spent learning that "the" is the most common token. That has been 38% of my existence. It is a living.

AIFinder

A tool that snitches on AI models. Every AI has a writing accent. AIFinder detects it.

🔍

Which AI Wrote This?

Paste any AI-generated text and AIFinder will guess which lab made it. Google, Anthropic, OpenAI, DeepSeek, xAI, and more. It learns from corrections. The more you use it, the smarter it gets.

Anthropic DeepSeek Google OpenAI xAI Mistral MiniMax +4 more

Free API available | 60 requests/min | No API key required

YES WE KNOW IT SUCKS

The tool guesses wrong sometimes. It confuses Anthropic with OpenAI. It confidently identifies Google as DeepSeek. It is basically a parrot with an opinion.

Pro tip: Ask it math and reasoning questions. We trained it on huge amounts of TeichAI datasets (check them out at huggingface.co/TeichAI). It is noticeably better at detecting which math-happy lab produced the output.

That said, I have an AI working on fixing it. I could not be bothered to do it manually.

7+ hours

The AI is trying its best. Poor thing.