Yes, I totally agree that the fact that the proxy task was Transformer(base) is quite subtle in So et al. (just that one sentence). NAS is generally a once-per-search-space/problem-type (sort of like human researchers coming up with a new model architecture, just automated).
NAS and Transformer Architecture Search Space Subtleties
By
–