AI Dynamics

Global AI News Aggregator

About

GPT-4.5 Preview Performance Results on SEAL Benchmarks

GPT-4.5 Preview evals results are out on SEAL #2 in Tool Use – Chat #3 in Tool Use – Enterprise #3 in EnigmaEval (behind Claude 3.7 Sonnet) #4 in MultiChallenge #5 in Humanity’s Last Exam #6 in VISTA (multimodal) See rankings here: https://
scale.com/leaderboard

→ View original post on X — @alexandr_wang