AI Dynamics

Global AI News Aggregator

About

AI Evaluation Methodology: Beyond Heuristic Approaches

It doesn't really measure what I mentioned (throughput, CoT, etc.) but yes, it's better overall. It might still be fundamentally flawed because it tries to fix this issue with heuristics instead of a more accurate evaluation.

→ View original post on X — @maximelabonne