AI Dynamics

Global AI News Aggregator

Aletheia solves 6 of 10 FirstProof problems using Gemini DeepThink

We ran two Aletheia versions (differing only by base model) powered by Gemini #DeepThink. Together, they solved 6/10 problems (2, 5, 7, 8, 9, 10) per majority expert assessments. Full transparency on our FirstProof interpretation and experiments: arxiv.org/abs/2602.21201. Evaluation is extremely hard! Only a handful of experts can even understand these problems. As such, we have conducted our study very carefully! Crucially, our solutions were generated without any human intervention and submitted within the timeframe of the FirstProof challenge. The lead author of FirstProof confirmed that fact in the public Zulip discussion of our solutions icarm.zulipchat.com/#narrow/….

→ View original post on X — @lmthang, 2026-02-25 16:08 UTC

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *