yes! there is a lot of nuance. My preferred way to put it is "transformers don't pre-store information for future tokens at the expense of the current token" (very much)
Transformers Token Processing: Information Storage Nuances
By
–
By
–
yes! there is a lot of nuance. My preferred way to put it is "transformers don't pre-store information for future tokens at the expense of the current token" (very much)