I'm tired of models being over-optimized for benchmarks but not actually being better. I think this one is one of those 'actually better' ones, especially for non-coding/reasoning tasks like writing
Models Over-Optimized for Benchmarks vs. Real Performance
By
–
Leave a Reply