EAGLE: Creating Equivalent Graphs to Test Deep Learning Libraries

Jiannan Wang, Thibaud Lutellier, Shangshu Qian, Hung Viet Pham, Lin Tan
2022 International Conference on Software Engineering  
Testing deep learning (DL) software is crucial and challenging. Recent approaches use differential testing to cross-check pairs of implementations of the same functionality across different libraries. Such approaches require two DL libraries implementing the same functionality, which is often unavailable. In addition, they rely on a high-level library, Keras, that implements missing functionality in all supported DL libraries, which is prohibitively expensive and thus no longer maintained. To
more » ... dress this issue, we propose EAGLE, a new technique that uses differential testing in a different dimension, by using equivalent graphs to test a single DL implementation (e.g., a single DL library). Equivalent graphs use different Application Programming Interfaces (APIs), data types, or optimizations to achieve the same functionality. The rationale is that two equivalent graphs executed on a single DL implementation should produce identical output given the same input. Specifically, we design 16 new DL equivalence rules and propose a technique, EAGLE, that (1) uses these equivalence rules to build concrete pairs of equivalent graphs and (2) cross-checks the output of these equivalent graphs to detect inconsistency bugs in a DL library. Our evaluation on two widely-used DL libraries, i.e., TensorFlow and PyTorch, shows that EAGLE detects 25 bugs (18 in TensorFlow and 7 in PyTorch), including 13 previously unknown bugs. CCS CONCEPTS • Software and its engineering → Software testing and debugging; Software reliability.
doi:10.1145/3510003.3510165 dblp:conf/icse/WangLQP022 fatcat:iwwm3ph6xrauhccatusbiaumaq