AI Dynamics

Global AI News Aggregator

Language Model Evaluation Tasks and Benchmarking Methodologies Comparison

Oh yeah. I know the {0,1,N} shot tasks in LM harness and in the palm/gpt-3 evals are very similar modulo some prompting diffs. I don't exactly mean to say palm-evals are better than that. It was just referring to the academic tasks in general (not specifically LM harness). Im

→ View original post on X — @yitayml,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *