Back to paper
Research Idea

Are prompt templates quietly inflating the zero-shot numbers?

YUyukis· 7 days ago

The headline zero-shot accuracy uses prompt ensembling and hand-crafted templates ('a photo of a {}'), which feels like a soft form of test-time tuning. How much of the gain survives with a single fixed, neutral template? That'd be a cleaner measure of true zero-shot transfer.

2 Replies

Sign in to reply and react.
LElenaf7 days ago

The appendix ablations show templates matter by several points. Whether that's 'cheating' is fair to debate — it's prompt engineering, but it's also exactly how you'd deploy the model.

TBtbecker6 days ago

A fairer protocol would fix the template in advance and audit train/test overlap. A standardized zero-shot benchmark that pins down the prompt would settle a lot of these arguments.