yep exactly, great work spelling it out step by step.
sometimes I talk about it as "breadth is free, depth is expensive" in the imagined full compute graph of the neural net. afaik this was the major insight / inspiration behind the Transformer in the first place. The first time
Breadth is free, depth is expensive in neural network compute graphs
By
–
Leave a Reply