AI Dynamics

Global AI News Aggregator

About

Scaling Per-Example Gradients: Algorithm Efficiency Improvements

our algorithm is a bit complicated, mostly because computing per-example gradients is hard to do at scale so we make some efficiency improvements:
– computing grads w vmap
– only using last-layer grads (which are still big, in the case of LMs)
– projecting them to a smaller dim

→ View original post on X — @jxmnop