AI Dynamics

Global AI News Aggregator

MMLU Evaluation Method: Log Probability vs Multiple Choice

Nice! Btw it's possible (in principle) to also evaluate MMLU in the same way I evaluate HellaSwag, where you swap out the 4 continuations in turn and predict the one with highest average log prob. Though it hurts the model by a few percent because it can't reason by elimination.

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *