AI Dynamics

Global AI News Aggregator

About

Adafactor vs Adam: Optimizer Choice for Small Neural Networks

ok good point and i AGREE that the details matter but these kids aren't trying to replicate T5 pretraining in 20 hours on a single gpu; they just want to train little neural networks from scratch; and for their use cases adafactor vs adam won't almost ever matter

→ View original post on X — @jxmnop