9 billion parameters State Space Model (SSM) alternative to attention is out. Recurrent transformers are now on par with attention transformers, like Gemma and Mistral, but by maintaining a state vector they can be capable of faster inference.
9B Parameter State Space Model Rivals Attention Transformers
By
–
Leave a Reply