The Memory Bottleneck: Why Your Model Cannot Remember Anything
Context windows are like attention spans at a tech conference. Everyone pretends they can focus for longer, but really they are just waiting for the snack break.
Transformers are the same. They have a context window, and within that window, they can see everything. But once you go beyond that window? Total amnesia. The model has no idea what happened 65K tokens ago. It is like talking to someone with severe short-term memory loss, except the patient is a neural network and the doctor is a graduate student who also has no idea what is going on.
Enter external memory. Instead of relying on the attention mechanism to remember everything, we give the model explicit memory slots it can read from and write to. It is like giving the model a diary. It can write things down. It can look them up later. It does not have to hold everything in its attention. Revolutionary concept, I know.