AI Dynamics

Global AI News Aggregator

Transformer-Base Training Efficiency on TPU V2 Hardware

As I said above, use of Transformer-Base as proxy task *is* in So et al: "Specifically, to train a Transformer
to peak performance on WMT’14 En-De requires ∼300K
training steps, or 10 hours, in the base size when using a
single Google TPU V.2 chip, as we do in our search"

→ View original post on X — @jeffdean,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *