ROSCOE is a first-of-its-kind suite of metrics for scoring step-by-step reasoning. By publishing this study we hope to provide a foundation that enables scalable systematic evaluation and benchmarking of new language models. See the paper on arXiv https://
arxiv.org/abs/2212.07919
ROSCOE: New Metrics Suite for Evaluating Step-by-Step Reasoning
By
–
Leave a Reply