The first step is SFTing our general-purpose 1.3B chat model on the right mixture. We iterated over many open-source datasets to find the ideal mix. This is great to raise performance, but also makes the model extremely verbose (>10k tokens on average).
SFT Fine-tuning Strategy for 1.3B Chat Model
By
–
