We get a ~10M parameter model trained for about 15 minutes on 1 GPU on all of Shakespeare concatenated into one 1MB file. We then sample infinite fake Shakespeare from our baby GPT. Can you spot which one is real? At only 10M params on 1M characters, from-scratch, I hope so 🙂
Training a 10M Parameter GPT Model on Shakespeare Data
By
–
Leave a Reply