BTW, Since Jamba supports a 256K context with high throughput, we also stumbled upon an issue where the fused_moe kernel didn’t work well in long contexts. Others seems to have had this too, according to some other open issues
Jamba’s 256K Context Reveals fused_moe Kernel Issues
By
–
Leave a Reply