This was very noticeable in GPT-4 around its release and anecdotally it’s gotten better since. My complete armchair guess is this is RL rather than imitation — it feels like a student playing to their teacher’s quirky grading system.
GPT-4’s RL vs Imitation: A Student Playing a Teacher’s Quirky Grading System
By
–