Not saying that Transformers are worse than RNNs, mind you — Transformers are *the best* at *what deep learning does* (generalizing via interpolation), specifically *because* of their strongly interpolative architecture prior (MHA). They are, however, worse at learning symbolic
Transformers Excel at Interpolation but Struggle with Symbolic Learning
By
–
Leave a Reply