GPT-2 is the "hello world" of LLMs I think (there must be a better analogy… err MOS 6502? xv6?), so that's why I started there. And it has a proper paper, weights released and available, and a lot is known about it. At this point it is an artifact of historical significance.
GPT-2 as the Hello World of Large Language Models
By
–