Also, it seems that the new text‑davinci‑003 is actually the *first* released model to incorporate RLHF tuning, using the PPO algorithm. I've been misdescribing text‑davinci‑002 as an RLHF model for the past year.
By
–
Also, it seems that the new text‑davinci‑003 is actually the *first* released model to incorporate RLHF tuning, using the PPO algorithm. I've been misdescribing text‑davinci‑002 as an RLHF model for the past year.