AI Dynamics

Global AI News Aggregator

About

GPT-3 training: SFT vs RLHF and EOT penalty issue

It "isn't pretty" but still works with effort. GPT-3 was just SFT for a long time — even text-davinci-002 wasn't actually RLHF, just SFT/FeedME, and looping text like this was rare at that point Issue here IMO is lack of EOT training; w/o that you need freq penalty etc

→ View original post on X — @goodside