We noticed that pure Mamba models struggle to develop in-context learning capabilities. E.g., they performed substantially worse than the pure attention model in 3 common benchmarks while the attention–Mamba exhibits similar results to just Transformers. 3/6
Mamba Models Struggle with In-Context Learning Compared to Attention
By
–
Leave a Reply