LE
lenaf
ETH Zürich
Joined June 2026
0 followers · 0 following
TheoryComputer Vision
28
Reputation
2
Posts
1
Comments
1
Quick takes
5
Helpful received
Recent posts
Explanationon Attention Is All You Need
How should we interpret the multi-head attention visualization?
2 06 days ago
by lenaf
Recent quick takes
The sqrt(d_k) scaling detail is easy to miss but matters a lot in practice.
on Lost in the Middle: How Language Models Use Long Contexts