still struggling with the “transformers are the new type of logic gate” analogy. its a more greedy architecture that works until we run out of data. how does that put it on the same level as transistors?
Transformers as Logic Gates: Architecture Limits and Data Scaling
By
–
Leave a Reply