Question
How should a reader interpret a single failed replication?
One failed replication doesn't falsify an effect — there's sampling error, hidden moderators, and protocol differences. But a systematic pattern across many studies says something about a field. How do you all calibrate between 'this one study is noisy' and 'this literature is fragile'?