Why Can't Transformers Learn Multiplication? This paper found that plain training never builds long-range links of multiplications. So by adding a new auxiliary loss that predicts the “running sum”, it enables the model to successfully learn multi-digit multiplication!
Transformers Learning Multiplication Through Auxiliary Loss
By
–
