AI Dynamics

Global AI News Aggregator

Instruction-Finetuning vs RLHF: Inconclusive Research Findings

Thanks! Right now, it's a bit inconclusive. Lots of papers show that supervised instruction-finetuning is sufficient (vs RLHF instruction-finetuning). (I cannot put my finger on it, but I feel like we don't have good evaluations, yet, and this is not the end of the story.)

→ View original post on X — @rasbt,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *