The last few rows of Table 1 in https://
arxiv.org/abs/2104.10350 report effect of this on energy usage and CO2e emissions: the more accurate Evolved Transformer (Medium) model uses 25% less energy to train on TPUv2 than does the Transformer (Big) model (& similar story on P100 GPUs)
Evolved Transformer uses 25% less energy than standard models
By
–
