First ever activation function swapping strategy just dropped for LLMs! Meta’s new Stochastic Activation proposes a new method which randomly selects between non-linear functions (ReLU & SiLU) in the FFN of LLM, achieving an activation sparsity of 90% with 1.65x CPU speedups!
Meta’s Stochastic Activation Achieves 90% Sparsity LLM Speedup
By
–
