… and off by ~118,000X if you consider training that language model on a Google TPUv2 in our Iowa datacenter. See Table 1 and Appendix C and D of https://
arxiv.org/abs/2104.10350 for more details.
Language Model Training Efficiency on Google TPU Hardware
By
–