What a week! Just read the system card, and it looks like they implemented reasoning via RL. My guess is the thinking on/off toggle is likely a system prompt. I wonder if they added inference-time scaling like o1 or if it’s just RL like R1. Anyone found details on that?
System Card Analysis: Reasoning Implementation via RL Toggle
By
–
Leave a Reply