PREVIEW: The Joy Of Benchmarks (Q1'26) My new #AI benchmark on out-of-domain programming languages (joy) suggests that open weights models are qualitatively *far* behind the frontier on logical reasoning and problem solving… The newest models: GLM-5, Minimax M2.5, and Kimi
Open Weights Models Lag Behind Frontier on Logical Reasoning
By
–
