Turns out language modeling (i.e. ~next word prediction; equivalent to compression) of internet text is this excellent objective – v simple to define and collect data for at scale. It forces the neural net to learn a lot about the world, "multi-tasking" across many domains.
Language Modeling as Universal Learning Objective Through Text Compression
By
–
Leave a Reply