4/We are also seeing remarkably similar data strategies for post-training from most labs at this point (at least what was published from Meta+Apple): – Hybrid data SFT, RLHF, & DPO setups
– Synthetic data on code and math
– Post-training data for most important capabilities
Post-Training Data Strategies: SFT, RLHF, and DPO Approaches
By
–
Leave a Reply