Trouble in BIG-Bench paradise? – @ErnestSDavis looks at 48 of the benchmarks within and finds problems with most: https://
cs.nyu.edu/~davise/Benchm
arks/BigBenchDiscussion.html
… – Many project AGI timelines based on performance on these benchmarks. If the benchmarks aren’t valid, consequent timelines are problematic
BIG-Bench Benchmarks Face Validity Scrutiny for AGI Predictions
By
–