By smaller events here, we refer to the probability of a token, given past tokens, p(c|ab). In probabilistic language modeling, a “token” is a single unit of text, like a word or part of a word. Modern language models consider a vocabulary size of ~100K tokens.
Token Probability in Language Models Explained
By
–
Leave a Reply