the use of zero convolutions while training results in a funny behavior that the authors name “sudden convergence phenomenon”, where the model is suddenly able to follow the input conditions, as depicted in the following image:
Zero Convolutions Training: Sudden Convergence Phenomenon Explained
By
–
