CritiqueRe: Table 2
Is the baseline comparison in Table 2 strong enough?
The positional analysis is compelling, but I'm not sure the baselines in Table 2 control for prompt formatting. Different models may be sensitive to delimiter choices in ways that confound the position effect. Did anyone check robustness to formatting?