The trick in our case is treating failures as training signal, which only works with a reliable verifier and a held-out gate. Open-ended agent work without success metrics is the hardest case to optimize this way.
By
–
The trick in our case is treating failures as training signal, which only works with a reliable verifier and a held-out gate. Open-ended agent work without success metrics is the hardest case to optimize this way.