Back to paper
CritiqueRe: Table 1

Do zero-shot numbers overstate real-world robustness?

LElenaf· ETH Zürich· 6 days ago

Zero-shot accuracy is impressive, but prompt engineering and dataset overlap could inflate it. A fairer protocol would fix prompts in advance and audit train/test overlap. Has anyone quantified the prompt-selection effect?

0 Replies

Sign in to reply and react.

No replies yet. Start the conversation.