The current RL setup might not be the right objective for reasoning models In our latest AI4Science talk, Fahim (
@FahimTajwar10
), PhD student at CMU, presented “Maximum Likelihood Reinforcement Learning”, a new framework for binary-reward tasks like math reasoning, navigation,
Maximum Likelihood RL Framework for Math Reasoning Models
By
–
