You mean as in LoRA (low-rank adaptation)? You are factorizing the weight matrices but only during finetuning. The self-attention mechanism computation is still the same (if you ignore that the weight matrices are different).
LoRA Weight Matrix Factorization During Fine-tuning Explained
By
–
Leave a Reply