AI Dynamics

Global AI News Aggregator

About

FlashAttention less useful with MLP GEMM latency dominance

Maybe FlashAttention is not that useful when you have MLP GEMMs that eat so much latency? Interesting graph in the latest blog post from @gpus_go_brrr!

→ View original post on X — @aymericroucher