Explanation
My mental model: LoRA ≈ a cheap, composable delta on a frozen base
The way I think about it: you keep one frozen base model and store only tiny ΔW = BA per task. That makes adapters composable and swappable at inference with near-zero overhead (you can merge BA into W). The practical win isn't just memory during training — it's serving many tasks from one base. Correct me if I'm overstating the inference story.