Statcheck does not work: All the numbers. Reply to Nuijten et al. (2017)

Thomas Schmidt
2017
Statcheck is an R algorithm designed to scan papers automatically for inconsistencies between test statistics and their associated p values (Nuijten et al., 2016). Due to concerns about its reliability and validity, Nuijten et al. (2017) check the output of the program against the sample of 49 papers manually checked by Wicherts et al. (2011), but they do not take into account that the program does only recognize 61 % of all statistical tests. Reconstructing the full table of hits, false
more » ... misses, and correct rejections with respect to inconsistently reported tests, I show that Statcheck has poor sensitivity (.52) and poor validity (phi = .54): In 1,120 tests, the program scored 29 hits, committed 19 false alarms, and missed an estimated 27 truly inconsistent tests, while 435 tests went unrecognized. If Statcheck flags a test, it is correct in only 60.4 % of all tests. If a test is truly inconsistent, Statcheck flags it in only 51.8 % of cases. Overall, only an estimated 5.00 % of all tests were actually inconsistent, and the program was clearly biased against flagging them.
doi:10.17605/osf.io/hr6qy fatcat:xcygbwm5qfbstjr3mezusgn52u