No, absolutely. But I think for this it's maybe worthwhile adopting methods that don't negatively impact modeling performance, like FlashAttention and kv caching.
FlashAttention and KV Caching for Improved Model Performance
By
–
Global AI News Aggregator
By
–
No, absolutely. But I think for this it's maybe worthwhile adopting methods that don't negatively impact modeling performance, like FlashAttention and kv caching.
Leave a Reply