AI Dynamics

Global AI News Aggregator

Post-Training Data Strategies: SFT, RLHF, and DPO Approaches

4/We are also seeing remarkably similar data strategies for post-training from most labs at this point (at least what was published from Meta+Apple): – Hybrid data SFT, RLHF, & DPO setups
– Synthetic data on code and math
– Post-training data for most important capabilities

→ View original post on X — @alexandr_wang,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *