Exactly. Except with the opposite conclusion. Every study so far that tries to test GPT-N for actual generalization has found that it scores no better than random on genuinely new problems — brand new coding problems in particular. This is why it can't do ARC either.
GPT models fail at genuine generalization on novel problems
By
–
Leave a Reply