LLMs are usually too large for most contexts, but creating pruned versions of a model usually require retraining. Here's a new straightforward alternative based on computing element-wise product between the weight magnitude and norm of input activations: https://
arxiv.org/abs/2306.11695
Pruning Large Language Models Without Retraining Using Activation Norms
By
–
Leave a Reply