Thanks, @CsabaSzepesvari
. Would you mind formally defining the RL problem for everyone here (I know you can better than most). I’d love for us to start with the formal definition and address the practice of RLHF in that context. In particular in most of RLHF, the states
Formal RL Problem Definition and RLHF Practice Discussion
By
–