Can reasoning LLMs think better if their Chain-of-Thought is continuous instead of discrete? This Meta paper introduces the first scalable way to train continuous CoTs with reinforcement learning—no need to distill from discrete references. By using "soft" tokens
Meta’s Continuous Chain-of-Thought Reasoning for LLMs
By
–
Leave a Reply