This is useful for memory efficient NNs because with careful architecture choice, you don't have to store the activations of every layer. You can store the last layer and work your way backwards. The example indicates that the operation composes, so you can make deep networks 5/n
Memory Efficient Neural Networks: Activation Storage Optimization
By
–
Leave a Reply