AI Dynamics

Global AI News Aggregator

New favorite LLM test reveals inconsistent performance across SOTA models

Wow, this has just become my favorite LLM test. I missed that this doesn't work but it really doesn't, even for SOTA LLMs. Seems to be a bit hit and miss, e.g. with GPT4o which failed 1/3 times, Claude failed 3/3 times.

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *