AI Dynamics

Global AI News Aggregator

About

Evaluation Rubrics Fail on Novel AI Research Paradigms

Long-form eval breaks on novelty. A rubric written before the research exists can't score research that shifts the criteria. 2,500 rubrics is a real dataset, but the measurement question is whether the set handles reports that break the rubric assumptions. That's where deep

→ View original post on X — @whats_ai,