To generate a sequence of 1000 tokens requires an insane 100K^1000 = 10^5000 choices. That’s a lot more than the estimated number of atoms in the universe, 10^82! With the chain rule the number of possible choices is "only" 100K * 1000 = 100M, a much more manageable number.
Chain Rule Reduces Token Generation Complexity Exponentially
By
–
Leave a Reply