AI Dynamics

Global AI News Aggregator

Using o1 as Reward Model for Output Selection

The code here yes, but concept is likely. It is quite likely they used o1 as a reward model/critic to choose from a group of outputs

→ View original post on X — @waitin4agi_,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *