AI Dynamics

Global AI News Aggregator

SGD and Transformer Network Depth Rules Matter

No, I asserted here that SGD was *a* thing. I mention elsewhere that the fact that a classical 100-layer transformer network obeys its own "100-step rule" is another thing that matters (and also is not inaccessibly deep mathematics).

→ View original post on X — @esyudkowsky,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *