RL trains our models to "think" before they answer, a process known as test-time reasoning. To serve this capability to billions of users and efficiently use tokens, we rely on two key levers: thinking time penalties to optimize token use and multi-agent orchestration that boosts… pic.twitter.com/oHHap4NAg3
— AI at Meta (@AIatMeta) 8 avril 2026
RL trains our models to "think" before they answer, a process known as test-time reasoning. To serve this capability to billions of users and efficiently use tokens, we rely on two key levers: thinking time penalties to optimize token use and multi-agent orchestration that boosts
Leave a Reply