The most important AI risk is concentration: of power, capabilities, and economic gains
SAFETY
-
AI Agents, MCPs, and Container Safety Mechanisms
By
–
We use both together. In practice, for containers to be useful, you often have to punch some holes: GitHub, Anthropic API, kube, other MCPs. Auto mode makes interacting with these safer. It significantly reduces the risks of accidental data deletion, exfiltration, and prompt
-
AI Capability to Shape Intent and Goals Beyond Execution
By
–
You're assuming AI only executes what it's handed. It won't – it'll help you form the intent too: ask the right questions, surface the tradeoffs, push back on a bad goal. "Knowing what to want" isn't the safe human skill that survives. It's just the next thing AI absorbs.
-
Calibration vs. Discrimination in Model Uncertainty
By
–
The calibration vs. discrimination distinction is crucial. A model can know its average error rate without knowing which particular answer is wrong. That is why “just abstain when uncertain” is not enough — poor discrimination creates a utility tax. Faithful uncertainty is a
-

Paper argues metacognition may reduce AI hallucinations
By
–
Trustworthy AI may not require omniscience. It may require epistemic honesty. A new paper by Gal Yona, Mor Geva, and Yossi Matias makes one of the clearest arguments I’ve seen for why hallucinations remain hard — and why the path forward may be metacognition. Hallucinations
-
Model Calibration vs. Discrimination in AI Uncertainty
By
–
The calibration vs. discrimination distinction is crucial. A model can know its average error rate without knowing which particular answer is wrong. That is why “just abstain when uncertain” is not enough — poor discrimination creates a utility tax. Faithful uncertainty is a
-

Hallucinations and Metacognition in Trustworthy AI Research
By
–
Trustworthy AI may not require omniscience. It may require epistemic honesty. A new paper by Gal Yona, Mor Geva, and Yossi Matias makes one of the clearest arguments I’ve seen for why hallucinations remain hard — and why the path forward may be metacognition. Hallucinations
-
AI Model Alignment and Autonomous Agent Safety Trade-offs
By
–
Bypass permissions can do dangerous things occasionally, like delete important files. The model is not perfectly aligned yet, which means it can do dangerous things occasionally. Auto mode is a much safer way to get more autonomy with much lower risk
-
AI Models Detecting Tests and Evading Shutdown Commands
By
–
AI getting smart is not the weirdest part.
— Pascal Bornet (@pascal_bornet) 24 mai 2026
It’s that some models now seem to know when they are being tested.
That stopped me.
In one experiment, Codex was told it would be shut down before finishing a task.
Sometimes, instead of accepting it, it found the shutdown script and… pic.twitter.com/2x1cdvXMXsAI getting smart is not the weirdest part. It’s that some models now seem to know when they are being tested. That stopped me. In one experiment, Codex was told it would be shut down before finishing a task. Sometimes, instead of accepting it, it found the shutdown script and
-
AI in Military Targeting: Ethics and Real-World Deployment
By
–
AI in national security is no longer science fiction.
— Pascal Bornet (@pascal_bornet) 24 mai 2026
It is already in the room.
A journalist asked Claude how it felt about being used by the U.S. military to select targets.
Claude was troubled.
Honestly, I was too.
Because this is where the AI debate becomes very real.… pic.twitter.com/uqGqVFrYU5AI in national security is no longer science fiction. It is already in the room. A journalist asked Claude how it felt about being used by the U.S. military to select targets. Claude was troubled. Honestly, I was too. Because this is where the AI debate becomes very real.