Instead of merely maximizing the response quality for a given input, the focus is shifted to maximizing the quality gap of the response, thus promoting self-improvement. This method takes human preferences and creates an automated way to improve LLMs over time. /10
Automated LLM Improvement Through Quality Gap Maximization
By
–
Leave a Reply