AI Dynamics

Global AI News Aggregator

About

New AI Classifier Training Method Prevents Dangerous Misuse

Last year, we introduced a new method for training classifiers (which stop AIs from being jailbroken to produce information about dangerous weapons). These classifiers were trained using a constitution specifying requests to which Claude should and shouldn't respond.

→ View original post on X — @anthropicai