Many of the academic papers shared on X are benchmarking papers, which are made so that current AIs will fail often (or it isn't a benchmark for future progress) You should pay attention to the realism of the benchmark, relative rankings, and the prompts & tools given to the AI.
Academic AI Benchmarks: Realism, Rankings, and Methodology Critique
By
–
Leave a Reply