I love doing this actually :). I think it's a pretty powerful eval too. Have all models generate something, then put it all together and give it back to all of them and ask them to rank all outputs. I thought models might have a bias to prefer their own outputs, but this doesn't
Model Evaluation: Testing AI Bias in Self-Assessment Rankings
By
–
Leave a Reply