This particular view is a decade out of date. You know that next-token prediction is mostly self-supervised, and as such the problem-solution pairs are "automatically" generated? Doing the same to train "I don't know next token" is a minor reframe, much research exists.
Next-Token Prediction Self-Supervised Learning and Uncertainty
By
–