What's the closest researchers have gotten to scalable self-play w/ LLMs? Had an idea I think is unique and easily implementable, curious if there is anything I should take a look at.
Scalable Self-Play with LLMs: Current Research and New Approaches
By
–