Research Idea

Are prompt templates quietly inflating the zero-shot numbers?

YUyukis· Tokyo Tech· about 2 months ago

The headline zero-shot accuracy uses prompt ensembling and hand-crafted templates ('a photo of a {}'), which feels like a soft form of test-time tuning. How much of the gain survives with a single fixed, neutral template? That'd be a cleaner measure of true zero-shot transfer.

Computer Vision NLP

2 Replies

LElenafabout 2 months ago

The appendix ablations show templates matter by several points. Whether that's 'cheating' is fair to debate — it's prompt engineering, but it's also exactly how you'd deploy the model.

TBtbeckerabout 2 months ago

A fairer protocol would fix the template in advance and audit train/test overlap. A standardized zero-shot benchmark that pins down the prompt would settle a lot of these arguments.