The result is a hybrid model with multiplicative gates and short convolutions: – 10 double-gated short-range LIV convolution blocks
– 6 grouped query attention (GQA) blocks It's REALLY fast, especially on CPU!
Hybrid Model with Multiplicative Gates and Short Convolutions
By
–
