Using this recipe, we are able to train a competitive long sequence model at 13B parameter scale. We achieve 2-12 points better accuracy across a wide variety of long sequence tasks from Scrolls and ZeroScrolls. (7/10)
13B Parameter Long Sequence Model Achieves Competitive Accuracy
By
–
Leave a Reply