AI Dynamics

Global AI News Aggregator

Reward Design Balances Correctness Preference Efficiency

Our reward design combines correctness, preference, and efficiency. Preference only counts when the answer is correct. This keeps the model from optimizing for better-sounding wrong answers.

→ View original post on X — @perplexity_ai,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *