It’s funny to me that no one has been able to figure out if we really need reinforcement learning to train language models. are we collectively not smart enough to figure out the math? or is there some theory we don’t have that would make questions like this easier?
Do We Really Need Reinforcement Learning for Language Models?
By
–