There have been some incredible works around extending transformers to other tasks(ex: ViT) and efficiency(etc.., ex: flash attention.) but deep down, it's the same transformer, with some minor changes here and there.
Transformers Remain Fundamentally Unchanged Despite Recent Extensions
By
–
Leave a Reply