AI Dynamics

Global AI News Aggregator

About

Formal RL Problem Definition and RLHF Practice Discussion

Thanks, @CsabaSzepesvari
. Would you mind formally defining the RL problem for everyone here (I know you can better than most). I’d love for us to start with the formal definition and address the practice of RLHF in that context. In particular in most of RLHF, the states

→ View original post on X — @nandodf,