AI Dynamics

Global AI News Aggregator

About

GPT-4’s RL vs Imitation: A Student Playing a Teacher’s Quirky Grading System

This was very noticeable in GPT-4 around its release and anecdotally it’s gotten better since. My complete armchair guess is this is RL rather than imitation — it feels like a student playing to their teacher’s quirky grading system.

→ View original post on X — @goodside