As models overfit to benchmarks, @alexgshaw of @LaudeInstitute is thinking about the problem this way: “how can we build the benchmark factory – the machine that other people can use to make their benchmarks – as opposed to just creating our own benchmarks one-by-one?”
— Snorkel AI (@SnorkelAI) 8 avril 2026
Enter… pic.twitter.com/3lre0GRO3G
As models overfit to benchmarks, @alexgshaw of @LaudeInstitute is thinking about the problem this way: “how can we build the benchmark factory – the machine that other people can use to make their benchmarks – as opposed to just creating our own benchmarks one-by-one?” Enter
Leave a Reply