Outlier exclusion procedures must be blind to the researcher's hypothesis

J Exp Psychol Gen. 2022 Jan;151(1):213-223. doi: 10.1037/xge0001069. Epub 2021 May 31.

Abstract

When researchers choose to identify and exclude outliers from their data, should they do so across all the data, or within experimental conditions? A survey of recent papers published in the Journal of Experimental Psychology: General shows that both methods are widely used, and common data visualization techniques suggest that outliers should be excluded at the condition-level. However, I highlight in the present paper that removing outliers by condition runs against the logic of hypothesis testing, and that this practice leads to unacceptable increases in false-positive rates. I demonstrate that this conclusion holds true across a variety of statistical tests, exclusion criterion and cutoffs, sample sizes, and data types, and shows in simulated experiments and in a reanalysis of existing data that by-condition exclusions can result in false-positive rates as high as 43%. I finally demonstrate that by-condition exclusions are a specific case of a more general issue: Any outlier exclusion procedure that is not blind to the hypothesis that researchers want to test may result in inflated Type I errors. I conclude by offering best practices and recommendations for excluding outliers. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

MeSH terms

  • Data Interpretation, Statistical*
  • Data Visualization*
  • Humans
  • Research Design*