It models a distribution over next tokens, yes — in a k-shot, each added example (ideally) constrains the distribution better. You’re appending to a search query in possible-text space that returns a more specific subset of texts.
k-shot learning constrains distribution over next tokens
By
–