Yes, they are all mathematically similar. The combined qkv one is interesting as it replaces the 3 separate weight matrices by a single weight matrix multiplication and then splits the result. It's kind of analogous to what the implementation at the end of chapter 3 does with
Mathematical Similarity in QKV Weight Matrix Implementations
By
–
Leave a Reply