
I think the fact that GPT-4o and Llama 3.3-80B did no significant harm is just as important as whether AI helped. If older (less accurate & more sycophantic) chatbots essentially did nothing for people who followed their advice, it means that there is less risk of harm as well.
