AI Dynamics

Global AI News Aggregator

About

Understanding Attention: K and V matrices, masking, and -inf for softmax

This is a must-watch to understand how attention works! Great visualization, explaining:
– Why the K and V matrix, what do they represent?
– Why mask the lower left part of the KV product?
– Why apply -inf to the lower left part of the KV product before softmax rather than just

→ View original post on X — @aymericroucher