We explored approaches with varying amounts of automation and human effort. In the simplest case, we generated thousands of yes-no questions for diverse behaviors just by instructing an LM (and filtering out bad examples with another LM).
Automated Generation of Yes-No Questions for LM Behavior Evaluation
By
–
Leave a Reply