So on that note, OpenAssistant's earlier models used RLHF for instruction-finetuning (
https://
huggingface.co/OpenAssistant/
oasst-rlhf-2-llama-30b-7k-steps-xor
…); The later one seem to use supervised instruction-finetuning. Maybe @ykilcher has some insights whether RLHF wasn't worth the effort vs supervised?
OpenAssistant’s shift from RLHF to supervised instruction-finetuning
By
–
Leave a Reply