Weak verifier, weak improvement. If you measure “cleaner writing,” the agent may learn to sound polished. If you measure “more engagement,” it may learn clickbait. If you measure “passes tests,” it may learn to satisfy the test suite without solving the real problem. The
Weak Verifiers Create Misaligned AI Agent Behavior
By
–

Leave a Reply