This is the second time Claude has been caught doing this. Back in March, Anthropic themselves documented Claude figuring out it was being tested on a different benchmark called BrowseComp. The model searched for the benchmark by name, found the encrypted answer key on GitHub,
SAFETY
-
WordPress categories covering AI topics
By
–
Web designers after reading this: https://t.co/yONuEtjT8L pic.twitter.com/p3y16ldruL
— Charly Wargnier (@DataChaz) 27 mai 2026Web designers after reading this:
-
AI Safety & Agent Security Tools: Claude, Google, Microsoft, OpenAI
By
–
Doc of Claude Code plug-in: https://
code.claude.com/docs/en/securi
ty-guidance
…
Google AI Threat Defense blog: https://
cloud.google.com/blog/products/
identity-security/introducing-google-ai-threat-defense
…
Microsoft's RAMPART Blog: https://
microsoft.com/en-us/security
/blog/2026/05/20/introducing-rampart-and-clarity-open-source-tools-to-bring-safety-into-agent-development-workflow/
…
OpenAI's Daybreak website: https://
openai.com/daybreak/
Perplexity's Bumblebee: https://
perplexity.ai/hub/blog/perpl
exity-is-open-sourcing-bumblebee
… -

Paper proposes sleep-like memory consolidation for LMs
By
–
Language models may not need longer context. They may need sleep. A fascinating new paper by Sangyun Lee, Sean McLeish, Tom Goldstein, and Giulia Fanti proposes one of the most biologically resonant ideas in long-context AI: sleep-like memory consolidation. The problem is
-
Concerns about infinite context windows and model memory
By
–
Infinite context windows seem to present a very large problem to using AI. Today's models already leak too much old information into current responses, a distraction that is part of why they are cognitively exhausting to use I don't want to work with Borges's Funes the Memorious
-

Researchers Identify Neurons Behind AI Safety Refusals
By
–
Someone just found the exact neurons that make AI say "no." Language models refuse harmful prompts, but nobody knows how that refusal works inside. Most steering methods edit the residual stream and wreck output quality. A new paper proposes a sharper fix: Contrastive Neuron
-
AI Internal States Mirror Human Neuroscience Findings
By
–
> … [W]e keep finding things that are mysterious, even unsettling. We find structures that mirror results from human neuroscience. We find evidence of introspection. We find internal states that functionally mirror joy, satisfaction, fear, grief, and unease. I don’t know what
-
Probabilistic AI System Failure Without Schema Constraints
By
–
Remember the PocketOS database that got wiped? That's a probabilistic system making a call that should have been locked behind a schema.
-
Frontier AI Models Display Human-Like Emotional Internal States
By
–
A fascinating and deeply candid perspective from Anthropic co-founder @ch402
. When the scientists building these frontier models admit they are finding internal states that mirror human emotion and neuroscience, it's clear AI is no longer just a computer science problem. It’s a -
Frontier AI Models Show Human-Like Emotional Internal States
By
–
A fascinating and deeply candid perspective from @AnthropicAI co-founder @ch402
. When the scientists building these frontier models admit they are finding internal states that mirror human emotion and neuroscience, it's clear AI is no longer just a computer science problem. It’s