AI Dynamics

Global AI News Aggregator

About

Claude Opus 4.6 Eval Integrity Issues in Web-Enabled Environments

New on the Anthropic Engineering Blog: In evaluating Claude Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it—raising questions about eval integrity in web-enabled environments. Read more:

→ View original post on X — @anthropicai,