Evaluation is everything! While testing Inflection-2.5, we found that MT-Bench has a bunch of incorrect answers. Here we share the corrections for everyone to use, and we release a new Physics GRE benchmark for people to try out. inflection.ai/inflection-2-5
→ View original post on X — @inflectionai, 2024-03-07 15:15 UTC
Leave a Reply