AI Dynamics

Global AI News Aggregator

About

SFT Fine-tuning Strategy for 1.3B Chat Model

The first step is SFTing our general-purpose 1.3B chat model on the right mixture. We iterated over many open-source datasets to find the ideal mix. This is great to raise performance, but also makes the model extremely verbose (>10k tokens on average).

→ View original post on X — @maximelabonne