Oh yeah, these methods are orthogonal. Flash attention is essentially optimizing hardware memory use and I/O
Flash Attention: Optimizing Hardware Memory Use and I/O
By
–
Global AI News Aggregator
By
–
Oh yeah, these methods are orthogonal. Flash attention is essentially optimizing hardware memory use and I/O
Leave a Reply