I'll give it a shot! Btw it is biases in both Linear and LayerNorm that appear to be useless (from my admittedly smaller scale experiments).
Linear and LayerNorm Biases Appear Useless in Experiments
By
–
By
–
I'll give it a shot! Btw it is biases in both Linear and LayerNorm that appear to be useless (from my admittedly smaller scale experiments).