Thanks @fchollet
. Indeed, the experiments used a large Pass@k which allowed us to focus on evaluating the search capability of AB-MCTS, rather than the official evaluation criteria based on k=2. We also used tasks in the public eval. Hopefully we’ll get down to Pass@2 someday! 🙂
AB-MCTS Search Capability Evaluation with High Pass@k Metrics
By
–
Leave a Reply