Question
Why does low-rank adaptation work so well in practice?
The premise is that the weight update during adaptation has low intrinsic rank. Intuitively I get why that could hold, but is there evidence for how low the rank actually needs to be across tasks? When does rank-1 or rank-2 start to hurt?