The tweet you quoted is wrong; I think the person clarifies right in the last tweet.
Its comparing apples to oranges — comparing OOB performance to prompt-hacking performance. Regardless — there's occasional hallucinations and weirdness in AINews because its LLM-Summarized
LLM Hallucinations and Prompt-Hacking Performance Analysis
By
–
Leave a Reply