AI Dynamics

Global AI News Aggregator

About

text-davinci-003 is first RLHF model, not 002

Also, it seems that the new text‑davinci‑003 is actually the *first* released model to incorporate RLHF tuning, using the PPO algorithm. I've been misdescribing text‑davinci‑002 as an RLHF model for the past year.

→ View original post on X — @goodside