AI Dynamics

Global AI News Aggregator

BIG-Bench metrics deserve deeper analysis and study

(3) Even just looking at BIG-Bench metrics is quite understudied IMO. There are hundreds of tasks in BIG-Bench, and each task has dozens of models evaluated, each with many evaluation metrics. There are task logs for some models. This raises natural questions:

→ View original post on X — @_jasonwei,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *