Back to paper
Explanation

What BERT actually changed vs earlier contextual embeddings

MIminseok· 11 days ago

ELMo gave contextual embeddings but from shallow LSTM features you mostly fed into a task model. BERT's shift is fine-tuning the whole deep Transformer end-to-end with a tiny task head — the representation and the task adaptation are unified. That's the part that made it a default starting point.

0 Replies

Sign in to reply and react.

No replies yet. Start the conversation.