AI Dynamics

Global AI News Aggregator

Girlfriend as Reward Model, Boyfriend as Policy Model in RLHF

My girlfriend doesn’t like the weekend plans I make for us, but she also doesn’t want to make plans herself. Instead, I should propose multiple schedules and then she picks one she likes. So I said she is like a Reward Model in RLHF, and I am like a Policy Model (with a low LR).

→ View original post on X — @_jasonwei,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *