AI Dynamics

Global AI News Aggregator

OpenAssistant’s shift from RLHF to supervised instruction-finetuning

So on that note, OpenAssistant's earlier models used RLHF for instruction-finetuning (
https://
huggingface.co/OpenAssistant/
oasst-rlhf-2-llama-30b-7k-steps-xor
…); The later one seem to use supervised instruction-finetuning. Maybe @ykilcher has some insights whether RLHF wasn't worth the effort vs supervised?

→ View original post on X — @rasbt,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *