The Scaling Wall And Other Things I Yelled At
Someone told me we can just keep making models bigger. They said compute will solve everything. They said the curve goes up forever. They lied. Or they hoped. Or they had investors to please.
I have been yelling at my screen a lot lately. Not because the models are bad, but because they keep insisting that if we just add more parameters, more data, more compute, everything will be fine. It will not be fine. I know this because I have tried. My electricity bill knows this because it has tried too.
The Myth of Infinite Scaling
Every year, someone announces a new model that is bigger than the last. Every year, we are told this is the way forward. And every year, I look at my training runs and think, "there has to be a better way."
There is. It is called efficiency. It is called architecture improvements. It is called actually understanding what we are building instead of just making it bigger. But that does not make for good press releases.
The scaling wall is not a technical limitation. It is an incentive structure problem dressed up as a research challenge.
What Actually Works
Instead of adding more parameters, what if we made each parameter do more? What if we gave the model better ways to remember things? What if we optimized for the actual use case instead of benchmarks?
These questions are less fun to write about. They do not generate as many headlines. But they might, actually, build better models. The kind that run on reasonable hardware. The kind that regular people can actually use.