We explored approaches with varying amounts of automation and human effort. In the simplest case, we generated thousands of yes-no questions for diverse behaviors just by instructing an LM (and filtering out bad examples with another LM). Random examples of LM-written evals:
Anthropic Explores Automating Language Model Evaluation
By
–
Leave a Reply