Right, and indeed ULMFiT was an LSTM. But also, that earlier work wasn't using language modeling on a general purpose corpus as a self-supervised task then fine-tuning that in two more steps for downstream tasks like today's LLMs. (It wouldn't have been feasible on that h/ware)
ULMFiT LSTM Evolution Language Model Fine-tuning History
By
–