
It also sets a new cost-performance frontier. On DSQA it scores 0.871, ahead of Anthropic's 0.815, at nearly half the cost per task. On WideSearch it leads on score while running cheaper.
By
–


It also sets a new cost-performance frontier. On DSQA it scores 0.871, ahead of Anthropic's 0.815, at nearly half the cost per task. On WideSearch it leads on score while running cheaper.