(4/n) At 75% sparsity, MediSwift-XL outperforms the dense MediSwift-Med, despite having the same non-embedding parameters. This highlights the advantages of training larger but sparse models over smaller, densely parameterized models.
MediSwift-XL Sparse Model Outperforms Dense Competitor
By
–
Leave a Reply