Global AI News Aggregator
About
By
–
Yes that's fair, in big O terms the separate QKV should be faster
→ View original post on X — @rasbt