I'm with @JeffDean on this. DistBelief taught us early important lessons about scaling up deep learning, and it was general enough for many algorithms including supervised backprop. Obviously, we got a lot of software and hardware architecture details "wrong" back in 2012 —
DistBelief Lessons: Scaling Deep Learning Architecture Evolution
By
–