I guess there is a tradeoff between how each activation is used, whether it is doing something that's optimal for a current or future token. i'm using information and store both pretty loosely. 🙂
Activation Trade-offs: Current vs Future Token Optimization
By
–