ICYMI — the Terminal-Bench creators just laid out what actually matters for agent evaluation.
— Snorkel AI (@SnorkelAI) 9 décembre 2025
Terminals > GUIs
Containers for real rollouts
TB 2.0 = harder tasks + deeper verification pic.twitter.com/zSazGyZYS2
ICYMI — the Terminal-Bench creators just laid out what actually matters for agent evaluation.
Terminals > GUIs
Containers for real rollouts
TB 2.0 = harder tasks + deeper verification
