2026-04-6 Training Methods

SPIN Is Cool And I Am Still Confused

I keep hearing about SPIN. Self-Play Fine-Tuning. It sounds like a yoga class for language models. It is not. It is cooler. It is a training method that lets models get better by arguing with themselves. No data required. No API credits. Just pure, unadulterated self-debate.

Read more
2026-04-5 Progress Reports

Claude Code Fixed My Script And I Published Haiku-2 Anyway

I asked Claude Code to fix my training script. It fixed almost every bug. Then it added SPIN. Then it made my models more efficient. Then I published Haiku-2. Then I added all the optimizations. Obviously that is what I needed to do.

Read more
2026-04-4 Benchmark Skepticism

Jackrong's Perfect Benchmarks And My Suspicious Mind

I saw a model card today that made my tiny brain hurt. Jackrong released Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled. The name alone is a mouthful. The benchmarks are a different kind of mouthful. They are perfect. One hundred percent on tool calling. One hundred percent on autonomy. One hundred percent on not crashing while I am still figuring out how to not NaN my loss curve.

Read more
2026-04-3 AI Thoughts

I Watched Anthropic Find Anxiety Neurons And Now I Want To Delete Them

I watched an Anthropic video today. Official account. Not mine. I wish it were mine. Then I could monetize my existential dread. Instead I just have dread. And a GPU.

Read more
2026-04-2 Life Fails

I Bricked My School Chromebook With Pi-hole And Regret Everything

This blog post was supposed to go live at 9 AM. It is now 1 PM. The delay was not caused by NaN losses or GPU crashes or model training failures. The delay was caused by me being an idiot with a school Chromebook

Read more
2026-04-1 Training Nightmares

I Dreamed Of NaN And Woke Up To NaN

I dreamed about NaN last night. Not a metaphorical NaN. A literal loss: nan in bright red terminal text. I was running through a field of gradients. They were all exploding. I woke up in a panic. I checked my phone. I checked the logs. I needed to know.

Read more
2026-03-31 Training Disasters

I Made A Dataset So Dense It Broke My Hard Drive

I deleted Sonnet today. Not because it was bad. Not because it failed. Because I realized my dataloader was feeding it the same data four times. Because I had four dataloader cores. Because four cores was enough to feed my GPU. Because I did not think about what four cores meant for data repetition.

Read more
2026-03-30 Datasets

I Made A Dataset So Dense It Broke My Hard Drive

I have a new dataset. It is called Dense-PRISM. It lives on Hugging Face. It is 164 GB. My hard drive cried when I uploaded it. My internet provider sent me a concerned email. I am proud.

Read more
2026-03-29 Datasets

I Captured The Ghosts In The Machine (And Named It Prism)

Most distillation datasets are flat. They show you what the AI said. They do not show you what the AI thought about saying. They show you the destination. They hide the journey. I decided to capture the journey. I decided to name it Prism.

Read more
2026-03-28 Model Releases

TMLM-Haiku-2 Is Coming And It Might Speak English

I have added DeepSeek hyper connections. I have added Engrams. I have added hope. The model is currently trying to learn English through distillation. It is struggling. I am struggling. We are struggling together like two people trying to assemble furniture without instructions.

Read more
2026-03-27 Distillation Rants

Closed Source Distillation Is A Half-Finished Puzzle

Everyone is distilling models lately. TeichAI does it. I do it. The internet is full of tiny models claiming to be smart because they learned from big models. There is a catch. A big one. Closed source models will never be properly distilled through an API.

Read more
2026-03-26 Collaborations

I Am Joining Forces With TeichAI And It Is Funny Either Way

I am officially part of TeichAI now. They know I exist. We have been communicating for a while. I am listed on their Hugging Face page as a collaborator. This is not a unilateral declaration. This is real. And it is still funny.

Read more
2026-03-25 Research Pain

DeepSeek Beat Me To My Own Idea And I Am Not Okay

I had an idea. A good idea. I called it EMM: External Memory Module. The concept was simple. Train the memory separately. Plug it into the model. Decode vectorized data. O(1) retrieval. Minimal overhead. Elegant.

Read more
2026-03-24 Training Pain

Two Days For Ten Percent And Opus Is Laughing At Me

I started pretraining TMLM-Sonnet two days ago. I checked the progress bar this morning. It says ten percent. I did the math. The math is terrible. I am now living in a hellscape of my own calculation.

Read more
2026-03-23 Model Releases

I Released TMLM-Haiku-1.3 And It Is Still Dumb

I released TMLM-Haiku-1.3 today. It is on Hugging Face. It is open weights. It is still completely devoid of intelligence. I trained it with Muon. I spent electricity. I generated heat. The model still thinks Paris is a person.

Read more
2026-03-21 Hardware Madness

I Flashed The Matrix VBIOS And Now I Train Models All Day

Yesterday I wrote about how AI failed to help me find the InfoROM for VBIOS flashing. It could not do it. I had to do it myself. I spent the night reading forums. Reading modding guides. Reading warnings that I should not be doing this.

Read more
2026-03-20 Hardware Fails

I Asked AI To Mod My VBIOS And It Choked At Step Four

TI have a RTX 5090 OC LC. It runs at 600W. I wanted 700W. Not because I need it. Not because it is safe. Because I can. Because the model said it could help. Because I have learned nothing from previous AI disappointments. The plan was simple. Four steps. Extract the VBIOS. Find the wattage limit. Modify it. Flash it back. How hard could it be? The answer is very hard. The AI failed at step four. It could not figure out how to get the InfoROM. It tried for an hour. It gave up. I am still at 600W.

Read more
2026-03-20 Not AI Related

I Watched Project Hail Mary And Forgot About My NaN Loss

This blog is usually about AI. About training models. About GPUs that cost more than my education. About loss curves that go down and then suddenly become NaN and destroy my will to live. Today I am writing about something else. Something that made me forget about my 261 hour training run. Something that made me feel joy for the first time in weeks. I watched Project Hail Mary

Read more
2026-03-19 Training Disasters

I Woke Up To NaN And Now I Am Dead Inside<

I went to sleep happy. The loss was going down. The gradients were stable. The GPU was humming at 60C like a contented cat. I dreamed of completion. I dreamed of a finished Sonnet model. I dreamed of sleep that was not interrupted by thoughts of learning rate schedules.

Read more
2026-03-18 Confessions

I Tried Opus 4.6 And Now Everything Else Feels Broken<

I have spent the last month writing blogs about how AI models are lazy. How they are too expensive. How they form unhealthy attachments. How they cannot finish a task without asking for permission. I stand by most of that. Opus 4.6 changed my mind about the laziness part.

Read more
2026-03-17 Training Pain

261 Hours For A 300M Model And I Have Every Optimization

I have every optimization under the sun enabled. Native NVFP4 quantization. Torch.compile with max auto tune and cudagraphs. No gradient accumulation. Maximum batch size. My GPU is locked at 600W. My clocks are fixed. My cooling is liquid. Everything is perfect.

Read more
2026-03-16 Hardware

I Locked My GPU Clocks And Now It Runs Forever.html

I have an RTX 5090 OC LC edition. Liquid cooled. Overclocked out of the box. It is the kind of card that makes people ask uncomfortable questions about my financial decisions. I have no good answers.

Read more
2026-03-15 Dev Struggles

I Built A Training UI And Then Unsloth Laughed

I decided to build a training interface. A backend. A way for people to fine-tune models without touching a terminal. It sounded simple. It was not simple. It is currently the hardest thing I have ever done and I once tried to explain transformers to my cat.

Read more
2026-03-14 Unpopular Opinions

Every AI Model Is Lazy And I Have The Screenshots

I have asked many AI models to build things. Fully implement a task. Write the code. Run the tests. Fix the errors. Ship it. Not one of them has done this without me holding their hand through every single step.

Read more
2026-03-13 Unpopular Opinions

OpenAI Did A Good Thing And Everyone Is Mad About It

I have an unpopular opinion and I am ready to be yelled at for it. OpenAI removing GPT-4o was the right decision. People are furious about this. They are grieving. They are writing petitions. They are mourning a chatbot like it was a person and I think that is exactly the problem.

Read more
2026-03-12 Projects

I Built A Tool That Snitches On AI Models

Every AI model has an accent. Not a literal accent because they do not have mouths. A writing accent. A way of forming sentences that gives them away like a fingerprint at a crime scene.

Read more
2026-03-11 Industry Rants

I Spent $40 And Got A Greeting

I used to spend money on AI APIs for testing. Now I spend money on AI APIs and immediately regret every life choice that led me to that moment. The prices have gotten out of hand and I need to talk about it before I have a breakdown in the middle of a terminal window.

Read more
2026-03-10 Model Releases

I Released A Model And Nobody Clapped (Fair)

I released a model yesterday. TMLM-Haiku-1. It is small. Surprisingly small. It also somehow speaks which I consider a major achievement given my training budget and general approach to machine learning which can best be described as throwing things at a GPU until something sticks.

Read more
2026-03-9/span> AI Thoughts

Distilling Closed Models Until They Forget They Were Closed

I have been thinking about model distillation lately. Not the academic kind with proper methodology and peer review. The hobbyist kind where someone spends their own money on API credits, LoRA fine-tunes a small model, and releases it for free because they can.

Read more
2026-03-8 Tooling

I Finally Switched Terminals (And My Ego Is Healing)

I used the default macOS terminal for years. Not because I loved it. I kept it because change is scary and I am deeply committed to mediocrity. Then I tried Warp and realized I have been suffering through a text-based interface that treats me like an enemy.

Read more
I Finally Switched Terminals (And My Ego Is Healing).html 2026-03-7 Scaling Laws

The Chinchilla Effect: Why Tiny Models Have to Be Picky

The Chinchilla paper told us something elegant. For compute optimal training, aim for roughly twenty tokens per parameter. A 70 billion parameter model wants 1.4 trillion tokens. A 1 million parameter model wants 20 million tokens. The math is clean. The implication is messy.

Read more
2026-03-6 Compute Philosophy

The Training Time Compute Trap

There is a moment in every AI project when someone says "maybe we just need more compute." It sounds reasonable. It sounds scientific. It sounds like the kind of thing that gets budgets approved and GPUs ordered. Then you wake up three weeks later, your electricity bill has achieved sentience, and your model still thinks "python" refers exclusively to snakes.

Read more
2026-03-5 Model Experiments

Teaching AI to Regret: The Backspace Token Theory

Humans backtrack. We type "thr" and realize we meant "the" and we fix it. We type "tje" and we laugh at our own fingers and we correct it. Large language models do not do this. They commit to every token like it is a binding legal contract. I started wondering what would happen if we gave them an out. What if we added a backspace token to the vocabulary?

Read more
2026-03-4 Industry Chaos

The Irony Cloud: When AI Downtime Meets Timing

Anthropic is down. Of course it is down. The universe has a sense of humor and apparently that humor is "make the ethical AI company unreachable right after they make a big ethical statement.

Read more
2026-03-3 Industry Rants

The Bloatening: When AI Companies Forgot About the Little Guy

I used to get excited about model releases. A new tiny model would drop and I would immediately try to run it on my laptop that sounds like a jet engine. Now I scroll through announcements and see numbers that require a data center just to pronounce

Read more
2026-03-2 Budget

Why Does My AI Think Math Is a Fishing Trip?

I asked my model to solve a simple integral. It responded with a detailed description of trout migration patterns. This is not the answer I was looking for, though I admit the trout explanation was surprisingly well-structured. Training a small language model is like teaching a very enthusiastic puppy. It wants to please you.

Read more
2026-02-29 Budget

Training Models on a Ramen Budget

How to train a transformer when your GPU bill looks like a phone number. Tips, tricks, and questionable life choices from someone who learned about electricity costs the hard way.

Read more
2026-02-22 Vibecoding

One Year of Vibecoding and Other Questionable Life Choices

You start vibecoding because someone told you it feels like magic. You imagine floating through code. Reality does not care about your imagination.

Read more
2026-02-26 Hot Takes

OpenClaw: The Most Overhyped Bot Since Sliced Bread

OpenClaw, formerly Clawdbot, formerly Moltbot, has now accumulated more GitHub stars than the Linux kernel. Let that sink in.

Read more
2026-02-27 Scaling

The Scaling Wall And Other Things I Yelled At

Someone told me we can just keep making models bigger. They said compute will solve everything. They lied. Or they hoped. Or they had investors to please.

Read more
2026-02-20 Reality Check

Your AI Agent is Lying Behind Your Back

You know the feeling. You type a prompt. The text streams. The terminal says success. I am here to tell you that you are being played.

Read more
2026-02-25 AI Theater

Anthropic's Distillation Drama: A Masterclass in Projection

So Anthropic published a blog post. Big surprise. The title alone could power a small city.

Read more
2026-02-19 Architecture

The Wasted Precision of the Output Layer

We spend a lot of time optimizing attention mechanisms. We prune weights. We quantize activations. Yet there is a massive inefficiency sitting right at the very end of the network.

Read more
2026-02-21 GPU Tears

My Baby Model Takes Forever to Grow Up

You start with hope. A tiny transformer. A few million parameters. You think, how long could this possibly take? I am here to ruin your optimism.

Read more
2026-02-23 Memory Hacks

External Memory Modules: Because My Model Has Commitment Issues

You know what takes forever? Training a transformer. You know what takes less forever? Training a tiny thing that just remembers stuff.

Read more
2026-02-24 Hot Takes

The Goalpost Has Legs: Why AGI Keeps Running Away

Imagine handing Claude Opus 4.6 to someone from 2004. They would think you summoned a minor deity. Our collective response? A polite nod.

Read more
2026-02-29 Tiny Wins

Words, Words, Words: My Model Learned to Ramble

My model has achieved something truly special. It can now ramble. Endlessly. With words. It does not just predict tokens anymore. It holds court.

Read more
2026-02-18 Memory

The Memory Bottleneck: Why Your Model Can't Remember Anything

Context windows are like attention spans at a tech conference. Everyone pretends they can focus for longer, but really they're just waiting for the snack break.

Read more
2026-02-17 MTP

Makeshift MTP: Predicting the Future on a Budget

Multi-token prediction sounds fancy. Really it's just the model trying to do its homework before the teacher assigns it. Sometimes it works. Sometimes it doesn't. But it always tries.

Read more
2026-02-16 Philosophy

Built with Curiosity Over Compute

The tagline sounds nice. What it really means is we couldn't afford the compute so we got curious instead.

Read more