An empirical study of sample size in ROC-curve analysis of fingerprint data
Biometric Technology for Human Identification III
The fingerprint datasets in many cases may exceed millions of samples. Thus, the needed size of a biometric evaluation test sample is an important issue in terms of both accuracy and efficiency. In this article, an empirical study, namely, using Chebyshev's inequality in combination with simple random sampling, is applied to determine the sample size for biometric applications. No parametric model is assumed, since the underlying distribution functions of the similarity scores are unknown. The
... s are unknown. The performance of fingerprint-image matcher is measured by a Receiver Operating Characteristic (ROC) curve. Both the area under an ROC curve and the True Accept Rate (TAR) at an operational False Accept Rate (FAR) are employed. The Chebyshev's greater-than-95% intervals of using these two criteria based on 500 Monte Carlo iterations are computed for different sample sizes as well as for both high-and low-quality fingerprint-image matchers. The stability of such Monte Carlo calculations with respect to the number of iterations is also explored. The choice of sample size depends on matchers' qualities as well as on which performance criterion is invoked. In general, for 6,000 match similarity scores, 50,000 to 70,000 scores randomly selected from 35,994,000 nonmatch similarity scores can ensure the accuracy with greater-than-95% probability.