AI Dynamics

Global AI News Aggregator

AB-MCTS Search Capability Evaluation with High Pass@k Metrics

Thanks @fchollet
. Indeed, the experiments used a large Pass@k which allowed us to focus on evaluating the search capability of AB-MCTS, rather than the official evaluation criteria based on k=2. We also used tasks in the public eval. Hopefully we’ll get down to Pass@2 someday! 🙂

→ View original post on X — @hardmaru,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *