Following up on my reasoning model article, I just read the new "s1: Simple Test-Time Scaling" paper, which describes an interesting method for improving reasoning models using a combination of pure supervised finetuning (SFT) and scaling inference compute.
In short, their
S1 Simple Test-Time Scaling: Improving Reasoning Models with SFT
By
–
Leave a Reply