AI Dynamics

Global AI News Aggregator

LLaMA 65B Evaluation Discrepancies Between Implementations

22/ Say you've trained a perfect LLaMA 65B reproduction & evaluated it with EAI harness (score 0.488). Comparing it to the published number (evaluated w. original implementation, score 0.637), it's a 30% difference so you're likely thinking "Oh no " But these numbers are…

→ View original post on X — @thom_wolf,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *