AI Dynamics

Global AI News Aggregator

About

All models failed M-Cube; GPT-o3 barely 72% after huge reduction

On M-Cube: All models failed.
Zero percent on full tasks.
Even with 10,000+ tokens of “reasoning.” On the simplified version? GPT-o3 barely crossed 72% – after reducing the search space by 5 million-fold.

→ View original post on X — @aibreakfast