Once you have the forward/backward, the rest of it (data loader, Adam update, etc) are mostly trivial. The real fun starts now though: I am now porting this to CUDA layer by layer so that it can be made efficient, perhaps even coming within reasonable fraction of PyTorch, but
Forward/Backward Implementation Complete, Now Optimizing with CUDA
By
–
Leave a Reply