We find that Claude 3 Opus generates arguments that don't statistically differ in persuasiveness compared to arguments written by humans. We also find a scaling trend across model generations: newer models tended to be rated as more persuasive than previous ones.
