Yeah exactly. I wonder if DPO genuinely competes with RLHF or if the models only looks good on the surface but are worse under closer inspection (like with imitation models)
DPO vs RLHF: Genuine Competition or Surface-Level Improvement?
By
–
Global AI News Aggregator
By
–
Yeah exactly. I wonder if DPO genuinely competes with RLHF or if the models only looks good on the surface but are worse under closer inspection (like with imitation models)
Leave a Reply