Recently, ByteDance created 2 RL algorithms for buliding reasoning models: DAPO and VAPO. According to the tech report of Seed-Thinking-v1.5, VAPO now stands as the SOTA solution in value-based methods, while DAPO establishes a new SOTA result for value-free approaches.
ByteDance DAPO VAPO Reasoning Models SOTA Algorithms
By
–
Leave a Reply