AI Dynamics

Global AI News Aggregator

About

Human Preference Evaluation as LLM Gold Standard Despite Its Limitations

2/2 So, the gold standard remains human preference evaluation, which is expensive and difficult to automate and scale. But even human preference evaluation has its flaws. E.g., see The False Promise of Imitating Proprietary LLMs (
https://
arxiv.org/abs/2305.15717).

→ View original post on X — @rasbt