AI Dynamics

Global AI News Aggregator

SFT Model Training with Reward Function Iterations

Step3: Sample a Prompt from the dataset, give that to the SFT model and get the generated response. Use the RF model to calculate the reward of the response and use that to update the Model. Iterate over it. That's how they have developed it. 9/9

→ View original post on X — @sumanth_077,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *