An Empirical Evaluation of Automated Black Box Testing Techniques for Crashing GUIs
2009 International Conference on Software Testing Verification and Validation
This paper reports an empirical evaluation of four black box testing techniques for crashing programs through their GUI interface: SH, AF, DH, and BxT. The techniques vary in their level of automation and the results they offer. The experiments we conducted quantify execution time and the capability of finding a crash for each technique on 8 different cellular phone configurations with historical (real) errors. The results show that AF and BxT offered better precision than SH and DH (AF and BxT
... and DH (AF and BxT found crashes in all 8 configurations), and BxT crashes the application the fastest more often (5 out of 8 cases). The experiments reveal that the selection of the random seed to AF and BxT results in a high variance of execution time (i.e., the time the technique takes to either crash the application or timeout in 40h): the median (across 8 phone configurations) of the standard deviation of execution times (for 10 runs per each phone configuration) is 7.79h for AF and 5.21h for BxT. Despite this fact, AF and BxT could crash the application consistently: the median of the precision (fraction of the 10 runs that results in a crash) is 74% for AF and 69% for BxT.