AI Dynamics

Global AI News Aggregator

LMArena Benchmarking Gamed by Sycophantic AI Responses

It is kind of amazing how many good benchmarking tools have been saturated or gamed (whether by accident or on purpose) in the past few months. LMArena really seemed like a good method, but then it turned out that you could just go full syncophantic and people loved it.

→ View original post on X — @emollick,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *