AI Dynamics

Global AI News Aggregator

MHA vs GQA Independence in Olmo 3 Models

Hard to say, but I think it may be independent of MHA vs GQA. Also, Olmo 3 7B uses MHA, and 32B uses GQA.

→ View original post on X — @rasbt,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *