The first step is to understand that when LMs are pre-trained on next-word prediction, they are really doing massive multi-task learning on thousands (millions?) of tasks. Here is a list of some potential tasks.
Language Models Learn Thousands of Tasks During Pre-training
By
–
Leave a Reply