AI Dynamics

Global AI News Aggregator

About

DDP Gradient Bucketing and Activation Checkpointing Optimization

I'm sorry, this makes no sense.
– Gradient bucketing is literally part of DDP
– DDP is a networking thing, obs you don't compile that
– Activation checkpointing works fine once hooks are disabled during tracing (ofc)

→ View original post on X — @jeremyphoward