Quick test of Opus 4.1 vs Opus 4.0 and other models. Using prompt from @FeatureCrewPod as well as a new prompt for Colosseum simulation.
— Peter Gostev (@petergostev) 5 août 2025
My sense is that Opus 4.0 was kind of busted for these tests – too many errors and quality was ok. 4.1 is not at around Sonnet level, so… pic.twitter.com/fYb5kD9ExE
Quick test of Opus 4.1 vs Opus 4.0 and other models. Using prompt from @FeatureCrewPod as well as a new prompt for Colosseum simulation. My sense is that Opus 4.0 was kind of busted for these tests – too many errors and quality was ok. 4.1 is not at around Sonnet level, so
Leave a Reply