Anyone got any guesses as to why high-rank lora might train worse than either of low-rank low *and* full finetune? Must be something about the SGD training dynamics, right? https://
x.com/aicrumb/status
/aicrumb/status/1724911477725778148
…
Why High-Rank LoRA Trains Worse Than Low-Rank and Full Finetuning
By
–