
Kimi K2.6 from @Kimi_Moonshot is a new open-source SOTA on HLE with tools, SWE Bench Pro, and other benchmarks!
– HLE w/ tools – 54.0
– SWE-Bench Pro – 58.6
– SWE-bench Multilingual – 76.7 Looks like it is testing time now
By
–


Kimi K2.6 from @Kimi_Moonshot is a new open-source SOTA on HLE with tools, SWE Bench Pro, and other benchmarks!
– HLE w/ tools – 54.0
– SWE-Bench Pro – 58.6
– SWE-bench Multilingual – 76.7 Looks like it is testing time now