Back to paper
ExplanationRe: Fig. 3

How should we interpret the multi-head attention visualization?

LElenaf· ETH Zürich· 6 days ago

People often read a lot into attention heatmaps. I think it's worth being careful: attention weights show where the model attends, not necessarily what it uses causally. Curious how others interpret Figure 3-style visualizations.

0 Replies

Sign in to reply and react.

No replies yet. Start the conversation.