3/5 Sequence packing helps for transformers, but it relies on architecture-specific support that is often missing and introduces implementation risks for non-transformer or hybrid models like @AI21Labs' Jamba.
Sequence packing: limitations for hybrid models like Jamba
By
–
Leave a Reply