We kept MRCR in the system card for scientific honesty, but we've actually been phasing it out slowly. Two reasons: (1) it's built around stacking distractors to trick the model, which isn't how people actually use long context, and (2) we care more about applied
Phasing Out MRCR: Rethinking Long Context Evaluation Methods
By
–
Leave a Reply