See Figure 4 in section 5.3 of So et al (
https://
arxiv.org/abs/1901.11117) In the figure, for example, the second-to-last red dot (Evolved Transformer) achieves a higher BLEU score than the last and largest blue dot configuration (plain vanilla transformer) & has 37.6% fewer parameters.
Evolved Transformer Achieves Better BLEU Score With 37.6% Fewer Parameters
By
–
