No, GPT-3 wasn't trained in 11 minutes. The GPT-3 architecture was trained on the C4 dataset to 2.69 log-probability in 11 minutes on 3584 H100 GPUs. Don't focus on the "11 minutes" — because it's like saying "ResNet-50 was trained in 5 seconds on MNIST to 80% accuracy"
GPT-3 Training Speed: Clarifying the 11-Minute Claim
By
–
Leave a Reply