AI Dynamics

Global AI News Aggregator

Overtraining of 13B Model vs Suboptimal 65B Production Deployment

So to answer the original question, they overtrained the 13B model but not the 65B model — likely because they decided on the budget beforehand. Thus, it's suboptimal to run the 65B in production.

→ View original post on X — @alexjc,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *