Oh wait this article is correct for a single layer of a 13b model — in fact it's only the 70b model that uses GQA AFAICT! 😀
By
–
Oh wait this article is correct for a single layer of a 13b model — in fact it's only the 70b model that uses GQA AFAICT! 😀