The Chain Rule of Probability is a powerful tool behind recent advances in Large Language Models. By multiplying together the probabilities of many smaller events, we can compute the probability of a complex event made up of those smaller events.
p(abc) = p(c|ab) * p(b|a) * p(a)
Chain Rule of Probability Powers Large Language Models
By
–
Leave a Reply