To train this model, we use Curriculum Learning by gradually increasing the token lengths trained on from 2K to 8K. We additionally train on an instruction tune dataset sampled from various popular sources and curated in-house. (4/10)
Curriculum Learning Boosts Model Training From 2K to 8K Tokens
By
–
Leave a Reply