Using structured weight pruning and knowledge distillation, the @NVIDIAAI research team refined Llama 3.1 8B into a new Llama-3.1-Minitron 4B. They're releasing the new models on @huggingface and shared a deep dive on how they did it https://
go.fb.me/b2h2c8
NVIDIA Refines Llama 3.1 8B into Minitron 4B Model
By
–
Leave a Reply