Its advantages, George, are best understood for pre-training. How it performs in inference remains a less well-understood area. And there may be trade-offs there where "dense" models do better.
Sparse vs Dense Models: Pre-training and Inference Trade-offs
By
–
Leave a Reply