@dair_ai - AI Dynamics

Combining LLMs: analysis of routing, voting, cascades

By

–

27 June 2026 23h34

When does combining LLMs help? Great analysis on combining language models, measured across 67 models from 21 providers. Any policy that routes, votes, cascades, or runs a mixture of agents and then returns one model's answer is bounded above by 1 minus beta, where beta is the

→ View original post on X — @dair_ai

27 June 2026

NVIDIA SOLAR automates speed-of-light performance analysis from PyTorch/JAX

By

@dair_ai

–

27 June 2026 20h34

NEW paper from NVIDIA. (bookmark it) Speed-of-light performance analysis tells you the theoretical floor of a workload, but teams still derive it by hand and freeze it. SOLAR automates the whole thing straight from PyTorch or JAX source. An LLM frontend translates arbitrary

→ View original post on X — @dair_ai

27 June 2026

Agent memory becomes data management system for LLMs

By

@dair_ai

–

24 June 2026 20h15

// Agent memory is a data system now // Great paper on long-term memory for LLM agents. (bookmark it) Agent memory has grown from simple retrieval into a full data-management layer with storage, retrieval, update, consolidation, and lifecycle governance. Yet most evaluations

→ View original post on X — @dair_ai

24 June 2026

Why Model Routers Fail for Coding: Information Deficit

By

@dair_ai

–

24 June 2026 2h17

What is actually limiting model routers for coding tasks? Most routers treat picking a model as a static, one-off classification. This paper identifies the real bottleneck as information deficit. Simply augmenting a vanilla LLM router with task-dimension-level performance

→ View original post on X — @dair_ai

24 June 2026

Learn Vercel’s new Eve agentic framework with hands-on labs

By

@dair_ai

–

23 June 2026 18h22

Learn to use the new eve agentic framework from Vercel.

Go try out the hands-on labs now. https://t.co/mC5nsSUNOe
— DAIR.AI (@dair_ai) 23 juin 2026

Learn to use the new eve agentic framework from Vercel. Go try out the hands-on labs now.

→ View original post on X — @dair_ai

23 June 2026

Largest LLM-as-Judge audit shows exact-match overstates skill

By

@dair_ai

–

22 June 2026 16h23

The largest LLM-as-a-Judge reliability audit yet. Researchers ran 21 judges from nine providers over roughly 541,000 judgments on MT-Bench, JudgeBench, and RewardBench. Findings: Validating a judge with exact-match agreement overstates its skill, because exact match does not

→ View original post on X — @dair_ai

22 June 2026

Top AI Papers of the Week: June 14-21 Highlights

By

@dair_ai

–

21 June 2026 17h55

The Top AI Papers of the Week (June 14 – June 21): – PreAct
– SpatialClaw
– Back on Track
– OpenClaw-Skill
– From Trainee to Trainer
– Compositional Skill Routing
– Can LLM Agents Infer World Models? Read on for more:

→ View original post on X — @dair_ai

21 June 2026

Evolving Meta-Skill for Multi-Agent Systems Without Weight Changes

By

@dair_ai

–

20 June 2026 19h10

// Evolving Meta-Skill for Multi-Agent Systems // Can a multi-agent system get better at orchestration without touching a single weight? Automatic MAS generation has been stuck between two bad options. Inference-time methods use frozen frontier models but never learn from past

→ View original post on X — @dair_ai

20 June 2026

AtomMem: Long-term memory for LLM agents using atomic facts

By

@dair_ai

–

19 June 2026 16h53

Great paper on long-term memory for LLM agents. (bookmark it) Coarse summaries drift and unconstrained updates corrupt, so AtomMem makes the unit of memory small. A Fact Executor pulls high-value atomic facts out of long interactions, organizes them into hierarchical event

→ View original post on X — @dair_ai

19 June 2026

How to Make Web Agent Skills Reusable

By

@dair_ai

–

18 June 2026 16h49

If you build web agents, this one is worth your time. It's on how to make agent skills reusable. (bookmark it) LLM web agents usually run as tool callers. Each turn, the model reads a fresh page and emits one low-level action, so horizons and policy-facing LLM completions both

→ View original post on X — @dair_ai

18 June 2026