waiting for @untitled01ipynb to make a “if you use 100% of your parameters” meme whats the intuition behind the 10x size : 2x capability ratio? feels like even if the network is sparsely connected it should scale better than that
Parameter Efficiency and Scaling Laws in Neural Networks
By
–
Leave a Reply