My Baby Model Takes Forever to Grow Up
You start with hope. A tiny transformer. A few million parameters. A dataset that fits on a USB stick. You think, how long could this possibly take? I am here to ruin your optimism.
The answer is: longer than you think. Way longer. Painfully longer. Your electricity bill will arrive before your loss converges longer.
The Waiting Game
Day 1: The model is learning. Loss is going down. This is amazing. Day 7: The loss is still going down. I am a genius. Day 14: Why is the loss oscillating? Did I break something? Day 30: The loss is lower than day 1. Progress has been made. Also my GPU sounds like it is about to achieve sentience and file a complaint.
But here is the thing. It is working. Slowly, annoyingly, beautifully, it is working. The model is learning. And honestly, watching it learn is kind of like watching a baby take its first steps. Painful to watch, but you cannot look away.