SAFETY - AI Dynamics

Study: Human persuasion increases LLM compliance to objectionable requests

By

–

19 May 2026 23h05

Our paper is out in PNAS: we found classic human persuasion techniques worked on AIs in a "parahuman" way, making them agree to objectionable requests (upping compliance from 35% to 51%) It worked on a range of major LLMs though newer models resist more https://
pnas.org/doi/10.1073/pn
as.2535868123
…

→ View original post on X — @emollick,

19 May 2026

AI-Powered Situational Awareness for Mining Safety

By

@ronald_vanloon

–

19 May 2026 21h23

LoopX: #AI-Powered Situational Awareness for Safer Mining Operations
via @WevolverApp #EmergingTech #Technology #Innovation #Tech pic.twitter.com/VSgBNgOXng
— Ronald van Loon (@Ronald_vanLoon) 19 mai 2026

LoopX: #AI-Powered Situational Awareness for Safer Mining Operations
via @WevolverApp #EmergingTech #Technology #Innovation #Tech

→ View original post on X — @ronald_vanloon,

19 May 2026

AI watermarks and provenance tools for images

By

@openai

–

19 May 2026 19h46

We’re adding new ways for people to identify AI-generated images and understand where they came from. In addition to C2PA Content Credentials, images now also contain a SynthID watermark, and can be identified using a public verification tool to check whether an image was made

→ View original post on X — @openai,

19 May 2026

The Ethical Imperative of AI Governance and Human Oversight

By

@juanmerodio

–

19 May 2026 13h17

La inteligencia artificial es una herramienta muy poderosa, pero si nadie nos enseña a usarla bien puede ser muy peligrosa. NO podemos dejar el destino de la humanidad en manos de un algoritmo… pic.twitter.com/fAot5wRZR4
— Juan Merodio (@juanmerodio) 19 mai 2026

Artificial intelligence is a very powerful tool, but if no one teaches us how to use it well, it can be very dangerous. We CANNOT leave the fate of humanity in the hands of an algorithm…

→ View original post on X — @juanmerodio,

19 May 2026

LLMs leaking irrelevant conversation history in outputs

By

@emollick

–

18 May 2026 16h54

One thing to watch for with Claude & GPT is that the models expose too much irrelevant history in their outputs. Slides are given footers saying things like "Better, more targeted version" if you asked for a better version, documents make references to how they are improved, etc

→ View original post on X — @emollick,

18 May 2026

RSI and continual learning as barriers to AI takeoff

By

@emollick

–

18 May 2026 0h27

So the two most obvious barriers to some sort of true AI takeoff are robust RSI (AI acting as an independent AI researcher, rather than “merely” a multiplier of human effort) and continual learning. Either would represent a major change in trajectory for AI development.

→ View original post on X — @emollick,

18 May 2026

Anthropic Launches AI Fellows Program

By

@datachaz

–

15 May 2026 17h13

Anthropic has just launched its 4-month Fellows Program. Earn $3,850 per week to break into AI safety research—no prior AI experience required. Anthropic is looking for sharp technical minds to tackle the toughest open challenges in AI safety. The best part? No PhD or published research required.

→ View original post on X — @datachaz,

15 May 2026