On Hypothesis Testing for Comparing Image Quality Assessment Metrics [Tips & Tricks]
IEEE Signal Processing Magazine
This is the accepted version of the paper. This version of the publication may differ from the final published version. Permanent repository link: http://openaccess.city.ac.uk/20448/ Link to published version: http://dx. In developing novel image quality assessment (IQA) metrics, researchers should compare their proposed metrics with state-of-the-art metrics. A commonly adopted approach is by comparing two residuals between the nonlinearly mapped scores of two IQA metrics and the difference
... the difference mean opinion score, which are assumed from Gaussian distributions with zero means. An F -test is then used to test the equality of variances of the two sets of residuals. If the variances are significantly different, then we conclude that the residuals are from different Gaussian distributions and that the two IQA metrics are significantly different. The F -test assumes that the two sets of residuals are independent. However, given that the IQA metrics are calculated on the same database, the two sets of residuals are paired and may be correlated. We note this improper usage of the F -test by practitioners, which can result in misleading comparison results of two IQA metrics. To solve this practical problem, we introduce the Pitman test to investigate the equality of variances for two sets of correlated residuals. Experiments on the LIVE database show that the two tests can provide different conclusions.