“In a nutshell, the joke was that in order to prevent A.I. language models from behaving in scary and dangerous ways, A.I. companies have had to train them to act polite and harmless. One popular way to do this is called “reinforcement learning from human feedback,” or R.L.H.F.”
RLHF: Training AI Models to Be Safe and Polite
By
–
Leave a Reply