Some pushed inference speed 5–10× faster.
Some scaled to 398B parameters.
Others rewired LLaMA-3 with Mamba layers—cutting latency without losing quality.
All of them moved beyond self-attention as the only tool for reasoning at scale.
Beyond Self-Attention: Scaling LLMs with Faster Inference and Alternative Architectures
By
–