I quite like it as a nice/intuitive testbed of in-context learning, and the experiments around example order, label names, label flipping, etc., which give a sense of the strength of the prior, and ICL as an optimizer. Does the performance here also correlate with other LLM
In-Context Learning as Testbed: Example Order and Label Effects
By
–