The code here yes, but concept is likely. It is quite likely they used o1 as a reward model/critic to choose from a group of outputs
Global AI News Aggregator
By
–
The code here yes, but concept is likely. It is quite likely they used o1 as a reward model/critic to choose from a group of outputs
Leave a Reply