AI Dynamics

Global AI News Aggregator

About

Flash Attention: Optimizing Hardware Memory Use and I/O

Oh yeah, these methods are orthogonal. Flash attention is essentially optimizing hardware memory use and I/O

→ View original post on X — @rasbt