TLDR: LMs have been around forever. Not obvious finding: turns out that if you scale up the training set and use a powerful enough neural net (Transformer), the network becomes a kind of general-purpose computer over text.
Scaling Language Models: Neural Networks as General-Purpose Text Computers
By
–