This is the training degeneracy that @jiawzhao is talking about if you only do identity initialization and not Hadamard for rectangular weight matrices
Training Degeneracy in Rectangular Weight Matrices Without Hadamard
By
–

By
–

This is the training degeneracy that @jiawzhao is talking about if you only do identity initialization and not Hadamard for rectangular weight matrices