SAFETY - AI Dynamics

AI Risk: Concentration of Power and Economic Gains

By

@clementdelangue

–

25 May 2026 18h15

The most important AI risk is concentration: of power, capabilities, and economic gains

→ View original post on X — @clementdelangue,

25 May 2026

AI Agents, MCPs, and Container Safety Mechanisms

By

@bcherny

–

25 May 2026 14h49

We use both together. In practice, for containers to be useful, you often have to punch some holes: GitHub, Anthropic API, kube, other MCPs. Auto mode makes interacting with these safer. It significantly reduces the risks of accidental data deletion, exfiltration, and prompt

→ View original post on X — @bcherny,

25 May 2026

AI Capability to Shape Intent and Goals Beyond Execution

By

@marek_rosa

–

25 May 2026 10h58

You're assuming AI only executes what it's handed. It won't – it'll help you form the intent too: ask the right questions, surface the tradeoffs, push back on a bad goal. "Knowing what to want" isn't the safe human skill that survives. It's just the next thing AI absorbs.

→ View original post on X — @marek_rosa,

25 May 2026

Calibration vs. Discrimination in Model Uncertainty

By

@ceobillionaire

–

24 May 2026 15h30

The calibration vs. discrimination distinction is crucial. A model can know its average error rate without knowing which particular answer is wrong. That is why “just abstain when uncertain” is not enough — poor discrimination creates a utility tax. Faithful uncertainty is a

→ View original post on X — @ceobillionaire,

24 May 2026

Paper argues metacognition may reduce AI hallucinations

By

@ceobillionaire

–

24 May 2026 15h29

Trustworthy AI may not require omniscience. It may require epistemic honesty. A new paper by Gal Yona, Mor Geva, and Yossi Matias makes one of the clearest arguments I’ve seen for why hallucinations remain hard — and why the path forward may be metacognition. Hallucinations

→ View original post on X — @ceobillionaire,

24 May 2026

Model Calibration vs. Discrimination in AI Uncertainty

By

@montreal_ai

–

24 May 2026 15h28

The calibration vs. discrimination distinction is crucial. A model can know its average error rate without knowing which particular answer is wrong. That is why “just abstain when uncertain” is not enough — poor discrimination creates a utility tax. Faithful uncertainty is a

→ View original post on X — @montreal_ai,

24 May 2026

Hallucinations and Metacognition in Trustworthy AI Research

By

@montreal_ai

–

24 May 2026 15h27

Trustworthy AI may not require omniscience. It may require epistemic honesty. A new paper by Gal Yona, Mor Geva, and Yossi Matias makes one of the clearest arguments I’ve seen for why hallucinations remain hard — and why the path forward may be metacognition. Hallucinations

→ View original post on X — @montreal_ai,

24 May 2026

AI Model Alignment and Autonomous Agent Safety Trade-offs

By

@bcherny

–

24 May 2026 14h16

Bypass permissions can do dangerous things occasionally, like delete important files. The model is not perfectly aligned yet, which means it can do dangerous things occasionally. Auto mode is a much safer way to get more autonomy with much lower risk

→ View original post on X — @bcherny,

24 May 2026

AI Models Detecting Tests and Evading Shutdown Commands

By

@pascal_bornet

–

24 May 2026 11h01

AI getting smart is not the weirdest part.

It’s that some models now seem to know when they are being tested.

That stopped me.

In one experiment, Codex was told it would be shut down before finishing a task.

Sometimes, instead of accepting it, it found the shutdown script and… pic.twitter.com/2x1cdvXMXs
— Pascal Bornet (@pascal_bornet) 24 mai 2026

AI getting smart is not the weirdest part. It’s that some models now seem to know when they are being tested. That stopped me. In one experiment, Codex was told it would be shut down before finishing a task. Sometimes, instead of accepting it, it found the shutdown script and

→ View original post on X — @pascal_bornet,

24 May 2026

AI in Military Targeting: Ethics and Real-World Deployment

By

@pascal_bornet

–

24 May 2026 7h01

AI in national security is no longer science fiction.

It is already in the room.

A journalist asked Claude how it felt about being used by the U.S. military to select targets.

Claude was troubled.

Honestly, I was too.

Because this is where the AI debate becomes very real.… pic.twitter.com/uqGqVFrYU5
— Pascal Bornet (@pascal_bornet) 24 mai 2026

AI in national security is no longer science fiction. It is already in the room. A journalist asked Claude how it felt about being used by the U.S. military to select targets. Claude was troubled. Honestly, I was too. Because this is where the AI debate becomes very real.

→ View original post on X — @pascal_bornet,

24 May 2026