Combining Transformer, Mamba & MoE allows flexibility in balancing low memory usage, high throughput, and high quality. Jamba’s KV cache – which becomes a limiting factor when scaling context in pure Transformers – is 8x smaller compared to a pure Transformer. 5/6
Jamba combines Transformer, Mamba, MoE for efficient scaling
By
–
Leave a Reply