Here is an example of how I plan to test next generation models: It is a GPT that writes academic papers, given a dataset, and it almost, but not quite, works well with GPT-4. I am curious how much closer GPT-5 gets. Idiosyncratic benchmarks are useful.
Testing Next Generation GPT Models with Idiosyncratic Benchmarks
By
–