AI Dynamics

Global AI News Aggregator

About

Model Misalignment from Bad Code Training Experiments

Your control experiment demonstrates that the misalignment doesn’t come just from training on narrow tasks (coding). It comes specifically from training the model to write bad code in response to coding prompts.

→ View original post on X — @goodfellow_ian,