No, I asserted here that SGD was *a* thing. I mention elsewhere that the fact that a classical 100-layer transformer network obeys its own "100-step rule" is another thing that matters (and also is not inaccessibly deep mathematics).
Global AI News Aggregator
By
–
No, I asserted here that SGD was *a* thing. I mention elsewhere that the fact that a classical 100-layer transformer network obeys its own "100-step rule" is another thing that matters (and also is not inaccessibly deep mathematics).
Leave a Reply