AI Dynamics

Global AI News Aggregator

About

Overtraining of 13B Model vs Suboptimal 65B Production Deployment

So to answer the original question, they overtrained the 13B model but not the 65B model — likely because they decided on the budget beforehand. Thus, it's suboptimal to run the 65B in production.

→ View original post on X — @alexjc,