So, it's some new sort "supervision techniques" > "We trained it using new supervision techniques combined with traditional methods like supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), similar to those used for GPT-4o."
New Supervision Techniques Combined with SFT and RLHF Training
By
–