I just mean long term, imo RL finetuning paradigm is a big upgrade over just SFT (expert imitation) for LLMs at the current stage of development and will continue to grow substantially.
By
–
I just mean long term, imo RL finetuning paradigm is a big upgrade over just SFT (expert imitation) for LLMs at the current stage of development and will continue to grow substantially.