Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data

Joost C. F. de Winter, Samuel D. Gosling, Jeff Potter
2016 Psychological methods  
You share, we take care!' -Taverne project Otherwise as indicated in The Pearson product-moment correlation coefficient (r p ) and the Spearman rank correlation coefficient (r s ) are widely used in psychological research. We compare r p and r s on 3 criteria: variability, bias with respect to the population value, and robustness to an outlier. Using simulations across low (N ϭ 5) to high (N ϭ 1,000) sample sizes we show that, for normally
more » ... ributed variables, r p and r s have similar expected values but r s is more variable, especially when the correlation is strong. However, when the variables have high kurtosis, r p is more variable than r s . Next, we conducted a sampling study of a psychometric dataset featuring symmetrically distributed data with light tails, and of 2 Likert-type survey datasets, 1 with light-tailed and the other with heavy-tailed distributions. Consistent with the simulations, r p had lower variability than r s in the psychometric dataset. In the survey datasets with heavy-tailed variables in particular, r s had lower variability than r p , and often corresponded more accurately to the population Pearson correlation coefficient (R p ) than r p did. The simulations and the sampling studies showed that variability in terms of standard deviations can be reduced by about 20% by choosing r s instead of r p . In comparison, increasing the sample size by a factor of 2 results in a 41% reduction of the standard deviations of r s and r p . In conclusion, r p is suitable for light-tailed distributions, whereas r s is preferable when variables feature heavy-tailed distributions or when outliers are present, as is often the case in psychological research.
doi:10.1037/met0000079 pmid:27213982 fatcat:zzaan2i47fejnnahmqbm63m6o4