On the Generalization of SFT A Reinforcement Learning Perspective with Reward Rectification
Generalization of SFT: Reinforcement Learning with Reward Rectification
By
–

By
–

On the Generalization of SFT A Reinforcement Learning Perspective with Reward Rectification