AI Dynamics

Global AI News Aggregator

128k Context Window Ablation Study on Billion-Parameter MoE Models

any references come to mind for 128k context on a 1Bish MoE? not sure i understand this area very well. i guess the primary ablation i would want is on context length quality vs width dimension…?

→ View original post on X — @swyx,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *