To be clear, this isn’t a test of the LLMs themselves, but their presentation from the vendor’s most consumer-ish UI. I assume many of these are getting the answer in the system prompt. My point is just it isn’t done consistently and I liked DeepSeek bothered.
Inconsistent LLM presentation and system prompts
By
–