Skip to content

AI Dynamics

Global AI News Aggregator

Rechercher

@lmthang

FirstProof Problem 7 Solution Confirmed by Original Mathematician

By

@lmthang

–

25 February 2026 17h45

The correctness of our solution to FirstProof problem 7 is also confirmed by Jim Fowler, the mathematician who conjectured the question originally! See github.com/google-deepmind/s… for all our transcripts and solutions (both correct and incorrect ones!) as well as public discussion of P7 at icarm.zulipchat.com/#narrow/….

→ View original post on X — @lmthang, 2026-02-25 16:45 UTC

25 February 2026
Aletheia AI Solves Open Math Problem P7 Successfully

By

@lmthang

–

25 February 2026 17h34

This is a remarkable milestone in which our agent can work on a research problem for a very long time, then come back and tell us if it has succeeded or failed! We visualize the inference cost Aletheia decided to spend on each candidate solution (as a multiple of the inference cost of for solving Erdős-1051, see our previous work nitter.net/lmthang/status/2018354…). P7 is extremely interesting. It has been an open problem for several years, and nobody else came close to solving it in the FirstProof contest per @tonylfeng. We initially thought Aletheia had no chance; turned out it was right! Aletheia spent most compute on P7, 16x amount we used for Erdős-1051. Remarkably, per @kimshmath, "This was the first case that I have ever seen that an AI applies several deep mathematical results (by Cartan/Leray/Borel/Atiyah/Quillen/Novikov/Kasparov…) flawlessly. It is a very unique instance."

→ View original post on X — @lmthang, 2026-02-25 16:34 UTC

25 February 2026
Aletheia solves 6 of 10 FirstProof problems using Gemini DeepThink

By

@lmthang

–

25 February 2026 17h08

We ran two Aletheia versions (differing only by base model) powered by Gemini #DeepThink. Together, they solved 6/10 problems (2, 5, 7, 8, 9, 10) per majority expert assessments. Full transparency on our FirstProof interpretation and experiments: arxiv.org/abs/2602.21201. Evaluation is extremely hard! Only a handful of experts can even understand these problems. As such, we have conducted our study very carefully! Crucially, our solutions were generated without any human intervention and submitted within the timeframe of the FirstProof challenge. The lead author of FirstProof confirmed that fact in the public Zulip discussion of our solutions icarm.zulipchat.com/#narrow/….

→ View original post on X — @lmthang, 2026-02-25 16:08 UTC

25 February 2026
Aletheia Math Agent Solves Hard FirstProof Problems Autonomously

By

@lmthang

–

25 February 2026 17h02

Thrilled to share: #Aletheia, our math research agent, just solved 6/10 notoriously hard FirstProof problems autonomously, the best result in the inaugural challenge! To me, this is even bigger than our historic IMO-gold achievement last year; these problems challenge even top mathematicians. We share our results transparently, see paper and full thoughts in the thread. 👇

→ View original post on X — @lmthang, 2026-02-25 16:02 UTC

25 February 2026
Gemini 3.1 Pro Meme Video

By

@lmthang

–

19 February 2026 22h34

Gemini 3.1 Pro be like pic.twitter.com/ZwCauGxLar
— Google (@Google) 19 février 2026

Gemini 3.1 Pro be like

→ View original post on X — @lmthang, 2026-02-19 21:34 UTC

19 February 2026
Human-AI Interaction Framework for Mathematics Assistance

By

@lmthang

–

15 February 2026 16h44

Yes, we provided 3 things for AI-assisted math:
* Human-AI interaction (HAI) card (photo), inspired by model cards
* Full transcripts https://
github.com/google-deepmin
d/superhuman/tree/main/aletheia
…
* A label for novelty-autonomy, inspired by SAE Levels of autonomy, see #Aletheia paper https://
arxiv.org/abs/2602.10177

→ View original post on X — @lmthang,

15 February 2026
Google DeepMind’s Rapid Progress: From Bard to DeepThink

By

@lmthang

–

13 February 2026 3h40

Again, it has been a privilege witnessing the relentless progress from Google Brain to Google DeepMind :
* ChatGPT -> Bard announcement (Mar 2023): 100 days
* Announcement of IMO-gold achievement -> DeepThink v1 launch (Jul 2025): 10 days
* Announcement of Aletheia agent &

→ View original post on X — @lmthang,

13 February 2026
Gemini Deep Think Powers Major Discoveries in Math Physics Computing

By

@lmthang

–

13 February 2026 1h56

and in case you missed it, here's how Gemini Deep Think has powered various discoveries from maths, to physics and computer science!

→ View original post on X — @lmthang,

13 February 2026
AI Models Achieve 76.7% on IMO ProofBench Mathematics

By

@lmthang

–

13 February 2026 1h55

And here's the little leaderboard that we maintain on IMO ProofBench in case you haven't seen it.
* Our IMO-gold model (non-public, Jul 2025) got 65.7%. * Gemini 3 Deep Think (public, Feb 2026) now got 76.7%.
* Aletheia (non-public) with inference-time scaling law +

→ View original post on X — @lmthang,

13 February 2026
DeepThink Achieves IMO Gold Using Inference-Time Scaling

By

@lmthang

–

12 February 2026 18h07

DeepThink is exceptionally good when powered by an inference-time scaling law that we showed in our Aletheia paper https://
arxiv.org/abs/2602.10177! These were benchmarked on our IMO-ProofBench graded by experts, which was the north-star metric leading to our IMO-gold achievement. Amazing

→ View original post on X — @lmthang,

12 February 2026

←Previous Page

1 2 3 4 5 … 16

INNOVATION GENERATIVE AI RESEARCH LLMS TOOLS MACHINE LEARNING CODE MARKET TRENDS BUSINESS BIG TECH TECHNOLOGY ETHICS ENTERPRISE AI APPS SOFTWARE DATA COMPUTING AGENTS AUTOMATION POLICY OPEN SOURCE CULTURE REGULATION ECONOMY MULTIMODAL AI SOCIETY INVESTMENT CREATIVE AI EDUCATION AI HARDWARE SAFETY HARDWARE JOBS AGI PROMPT ENGINEERING STARTUPS INDUSTRY ROBOTICS WORKFORCE SECURITY CYBERSECURITY HEALTHCARE AI SYSTEMS SUSTAINABILITY WEB3 DECENTRALIZED AI

AI Dynamics

Global AI News Aggregator

About
Archives

Rechercher