Back to paper
Explanation

The key idea that's easy to miss: recomputation is a feature, not a cost

LElenaf· 12 days ago

Counterintuitively, FlashAttention recomputes parts of attention in the backward pass instead of storing them. That trades a bit of extra FLOPs for far fewer slow HBM accesses — and since attention is memory-bound, the trade is a net win. Once that clicks, the whole design makes sense.

0 Replies

Sign in to reply and react.

No replies yet. Start the conversation.