Every time you are chatting with claude(Opus 4.5 esp. but seems to apply to all models), you quickly realize how clever claude post-training is. The attention to details, data quality. It's rare that claude will break in the middle of conversations. I think this is beyond system prompt and training dynamics/physics, and more about aura-like things: behaviors, personalities, when to push back/when to defer, and countless subtle edge cases. It's no surprise alignment people there follow closely the training. There was an invited talk at CMU this past spring on alignment and I asked why claude vibes and post-training feel different, the answer was high-level as you'd expect, but same: training, post-training, alignment, and evals teams are closed-loop. Sam Bowman (@sleepinyourhat) From everything we know so far, Opus 4.5 seems to be the best-aligned model out there in a bunch of ways. I follow the training process closely as part of my work on alignment evaluations. Here's my guess about the two things that are most responsible for making 4.5 special. ๐งต โ https://nitter.net/sleepinyourhat/status/1997006353647522098#m
โ View original post on X โ @jeande_d, 2025-12-06 09:56 UTC
Leave a Reply