AI Dynamics

Global AI News Aggregator

About

Opus 4.7 Underperforms 4.6 on BullshitBench Testing

BullshitBench: Opus 4.7 did WORSE than Opus 4.6 family. The 'Max' thinking version did worse than non-thinking – 74% 'pushback' vs 83% for non-thinking. As always, code, data etc is on github

→ View original post on X — @petergostev,