It's because the objective is not truth but attention and they get RL'd by it, so they are a lot more optimal than you give them credit for.
Attention optimization over truth in reinforcement learning systems
By
–
By
–
It's because the objective is not truth but attention and they get RL'd by it, so they are a lot more optimal than you give them credit for.