You’re not comparing RLHF to no RLHF here, you’re comparing different generations post-RLHF mystery models, likely of different sizes. If you don’t do post-training at all, naive attempts to talk to the model go like this:
Discussion on RLHF vs post-training model comparisons
By
–
