AI Dynamics

Global AI News Aggregator

About

Comparing LLMs on identifying their version

This is such a vibes-based eval, but the first prompt I give any new LLM is “Which version is this?” and DeepSeek-V3 nailed it See below for how Claude, Gemini, ChatGPT, and Grok fare on the same — TLDR: it’s all over the map

→ View original post on X — @goodside