Let's keep in mind these are still super simple "task" evals. Little queries served on a platter, even if increasingly difficult. Which are super helpful, but when people talk about AGI they usually have an autonomous agent swarm in mind performing long-running jobs across
Task Evals vs AGI: The Gap Between Simple Tests and Autonomous Agents
By
–
Leave a Reply