AI Dynamics

Global AI News Aggregator

About

GQA Architecture in 70B Language Models Explained

Oh wait this article is correct for a single layer of a 13b model — in fact it's only the 70b model that uses GQA AFAICT! 😀

→ View original post on X — @jeremyphoward