LLMs usually rely on labeled data to improve But what if they could self-improve at test time — without any labels? Introducing TTRL: a new RL framework that boosts LLM reasoning by up to 159% on the AIME 2024 using just unlabeled test data Trending #1 on alphaXiv
TTRL Boosts LLM Reasoning by 159% Using Unlabeled Test Data
By
–
