As part of this work, we’re also releasing SWEET-RL, a novel RL algorithm for long-horizon & multi-turn tasks which can perform better credit assignments. Our experiments demonstrate that SWEET-RL achieves a 6% absolute improvement in success & win rates on
SWEET-RL: Novel Algorithm for Long-Horizon Multi-Turn Tasks
By
–
Leave a Reply