AI Dynamics

Global AI News Aggregator

GQA Optimization: K and V Attention Projections Explained

Ah well it's certainly at least done way better than the others at finding something relevant! However it looks like this article might be mistaken in saying the 4 attention projections are the same. IIRC the GQA optimization only applies to K and V, which we see here:

→ View original post on X — @jeremyphoward,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *