Errors accumulate exponentially.
This is not fixable with the current architecture.
Fhe shelf life of autoregressive LLMs is very short--
in 5 years nobody in their right mind will use them.
Bad token sampling is clearly a problem with long generation with autoregressive models. One bad token and the LLM can be sent down a road that it can't come back from. It's funny seeing this with math problems, where after it generates a BS solution you can ask it to spot the problem (which it often does).
I think something like a diffusion model that iteratively refines the entire output will ultimately produce much better results for language.