A prediction problem in a discrete and finite domain is easy because we know how to represent uncertainty in the prediction.
If there are 30,000 possible tokens, we output a 30k-dimensional vector of values between 0 and 1 that sum to 1 (thank you softmax).
We can't output a
Predicting Uncertainty in Discrete Token Probability Distributions
By
–