Some of the highlights of this work include ResNet-18 on ImageNet achieving a 3.5% accuracy improvement, and #GPT-3 Small on WikiText-103 reducing perplexity by 0.4, both matching larger dense model variants that have 2x or more FLOPs. (3/4)
ResNet and GPT-3 achieve efficiency gains matching larger models
By
–
