AI Dynamics

Global AI News Aggregator

Systematic Blind Evaluation Benchmarks for LLMs Urgently Needed

friendly reminder to everyone that there isn't yet a good & proper systematic blind eval/benchmark of LLMs yet, especially those on real world data/use-cases. if i were in academia this is something i'll work on immediately.

→ View original post on X — @yitayml,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *