Hard to say, but I think it may be independent of MHA vs GQA. Also, Olmo 3 7B uses MHA, and 32B uses GQA.
MHA vs GQA Independence in Olmo 3 Models
By
–
Global AI News Aggregator
By
–
Hard to say, but I think it may be independent of MHA vs GQA. Also, Olmo 3 7B uses MHA, and 32B uses GQA.
Leave a Reply