AI Dynamics

Global AI News Aggregator

@emollick

AI Benchmarks Beyond Training Data: Olympiads and New Metrics

By

@emollick

–

20 September 2025 7h24

It is why the gold medals at the various math and coding Olympiads were a big deal: unsaturated benchmarks that weren't in the training data with clear human comparisons. We are down to the various measures of task length (METR), HLE, FrontierMath, vending machine operation…

→ View original post on X — @emollick,

20 September 2025
Intelligence Index Benchmarks Need Improvement Beyond Saturation

By

@emollick

–

20 September 2025 7h21

Not to take away from Grok 4 Fast (which seems like a very good model) or from Artificial Analysis (one of the few organizations doing independent benchmarking), but the Intelligence Index is an average of pretty saturated benchmarks (aside from HLE), we really need better ones.

→ View original post on X — @emollick,

20 September 2025
AI matching web search for political information accuracy

By

@emollick

–

18 September 2025 22h12

A cautiously optimistic result on AI and disinformation. A week before 2024 UK elections 13% of all voters used AI for political topics. A randomized trial found this may be good: using AI led to similar gains in true knowledge as doing web search, regardless of model & prompts.

→ View original post on X — @emollick,

18 September 2025
Self-Correcting AI Agents Enable Exponential Task Horizon Gains

By

@emollick

–

17 September 2025 19h25

I think the significance of this is under-appreciated: the assumption has often been that AI agents are brittle as one failure in a chain breaks a task But this paper shows smart models are self-correcting & that small gains in accuracy lead to exponential gains in task horizons

→ View original post on X — @emollick,

17 September 2025
Coding Tools Enable Problem-Solving Without Expert Programming Skills

By

@emollick

–

17 September 2025 2h35

Sure there are other ways to work with these tools but all of them require understanding something about coding practices. And sure, not knowing those hurts your ability to do “real programming “ – but the coding tools are good enough to solve lots of problems with tiny bad code

→ View original post on X — @emollick,

17 September 2025
Accessibility barriers to agentic AI coding tools for non-developers

By

@emollick

–

17 September 2025 2h23

One of the larger barriers to more people using agentic coding tools from the big AI companies to build their own small apps is that you have to go through GitHub to use them, a website that is nearly incomprehensible to most non-coders.

→ View original post on X — @emollick,

17 September 2025
Coding Gatekeeping in AI Development Labs

By

@emollick

–

15 September 2025 23h49

Coder says what? Joking. Yes, I get there is a plausible reason why coding is elevated the way it is in the labs, but it still leaves almost all work & workers (& students) out of the really interesting part of rapid AI development that only programmers get to see right now.

→ View original post on X — @emollick,

15 September 2025
Frontier LLMs and Specialized Models: Essential for AI Tool Development

By

@emollick

–

15 September 2025 23h44

Yes, every other company on the planet is rushing to release AI tools for other forms of work, but if you don't own a frontier LLM & you can't train specialized models to go with your specialized AI-for-X interface, you are limited in what you can accomplish. Again, see coding.

→ View original post on X — @emollick,

15 September 2025
AI Labs Prioritize Code Tools Over Other Specialized Applications

By

@emollick

–

15 September 2025 23h39

The problem with the fact that the AI labs are run by coders who think code is the most vital thing in the world, is that the labs keep developing supercool specialized tools for coding (Codex, Claude Code, Cursor, etc.) but every other form of work is stuck with generic chatbots

→ View original post on X — @emollick,

15 September 2025
Scaling Returns in AI: Reasoning Models Drive Exponential Project Completion

By

@emollick

–

15 September 2025 22h34

Paper argues that diminishing returns to AI scale are an illusion Economic value comes from completing long projects, not single questions. And accuracy drives how long a project AI does: small gains compound exponentially! Reasoners are much more accurate, with big impacts.

→ View original post on X — @emollick,

15 September 2025