19/ In this case, we use probabilities again but this time we compute the probability of generating the full answer sequence, not just the letter: we sum the log of the probabilities and compute a normalization by dividing by the number of tokens to not penalize longer sequences.
Probability-based sequence generation normalization in language models
By
–
Leave a Reply