wait what?!? i know on HHEM the rate is low, but that's just a summarization benchmark. i get many hallucinations on all the top reasoning models in practice. basically any time i ask for something they can't figure out, they nearly always make something up
Top Reasoning Models Still Hallucinate Frequently in Practice
By
–