Mixed-effects design analysis for experimental phonetics [post]

James Kirby, Morgan Sonderegger
2018 unpublished
It is common practice in the statistical analysis of phonetic data to draw conclusions on the basis of statistical significance, often judged by the size of a p-value. While p-values reflect the probability of incorrectly concluding a null effect is real, they do not provide information about other types of error that are also important for interpreting statistical results. In particular, it is possible to fail to detect a true effect, to exaggerate the magnitude of an effect, or even to
more » ... ctly estimate an effect's direction, resulting in erroneous and biased measures of effect size. In this technical report, we focus on three measures related to these errors. The first, power, reflects the failure to detect an effect that in fact exists. The second and third, Type M and Type S errors, measure the extent to which estimates of the magnitude and direction of an effect are inaccurate. We then provide 'design analysis' (Gelman & Carlin, 2014), using data from an experimental study on German incomplete neutralization, to illustrate how power, magnitude, and sign errors vary with sample and effect size. This case study shows how the informativity of research findings can vary substantially in ways that are not always, or even usually, apparent on the basis of a p-value alone. We conclude by repeating three recommendations for good statistical practice in phonetics from best practices widely recommended for the social and behavioral sciences: report all results; design studies which will produce high-precision estimates; and conduct direct replications of previous findings.
doi:10.31234/osf.io/y8xcf fatcat:3gxgbmjylnfyllpm5ptuaiqeke