AI Dynamics

Global AI News Aggregator

About

Multi-Turn Agent Evaluation and Context Compaction Limits

Paper does test multi-turn (24 tool calls in OfficeQA, 30 turns in SpreadsheetBench, 50 steps in ALFWorld), but mid-session auto-compaction isn't part of the eval. Skill being under 2K tokens probably helps it survive compaction, but not validated against that failure mode.

→ View original post on X — @alphasignalai,