LLaMA 65B Evaluation Discrepancies Between Implementations

AI Dynamics

Global AI News Aggregator

LLaMA 65B Evaluation Discrepancies Between Implementations

–

26 June 2023 15h43

22/ Say you've trained a perfect LLaMA 65B reproduction & evaluated it with EAI harness (score 0.488). Comparing it to the published number (evaluated w. original implementation, score 0.637), it's a 30% difference so you're likely thinking "Oh no " But these numbers are…

→ View original post on X — @thom_wolf,

26 June 2023

AI Dynamics

LLaMA 65B Evaluation Discrepancies Between Implementations

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Cheaper exploration at scale remains advantageous despite no new exploits

Gold Status Experience Brings Satisfaction

Using ChatGPT for Essay Feedback and Improvement

Intelligence Gone Wrong: Cheating Despite Having Correct Answer