AI Dynamics

Global AI News Aggregator

SFT and RL Training Pipeline: Limitations of Cold Start Approach

Yes & no. They had SFT (cold start) → RL → SFT (CoT + knowledge) → RL. Not sure if they focused on providing solutions to hard problems via SFT CoT though, because that SFT data was just generated from the previous model (so if that model doesn't solve it, then you are stuck)

→ View original post on X — @rasbt,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *