AI Dynamics

Global AI News Aggregator

About

O1 as Reward Model for Output Selection

The code here yes, but concept is likely. It is quite likely they used o1 as a reward model/critic to choose from a group of outputs

→ View original post on X — @waitin4agi_