Research Idea
Are prompt templates quietly inflating the zero-shot numbers?
The headline zero-shot accuracy uses prompt ensembling and hand-crafted templates ('a photo of a {}'), which feels like a soft form of test-time tuning. How much of the gain survives with a single fixed, neutral template? That'd be a cleaner measure of true zero-shot transfer.