The most notable thing about this result is that this unnamed experimental reasoning model achieved this score without any tool usage at all – it looks like it's just another classic next-token-predicting LLM with a bunch of reinforcement learning layered on top
Experimental Reasoning Model Achieves Score Without Tool Usage
By
–
