AI Dynamics

Global AI News Aggregator

About

Did RLHF implicitly teach least-to-most output style?

I wonder if it just learned to prefer least-to-most output style implicitly from RLHF because it leads to better answers.

→ View original post on X — @goodside