Checking Time Linearity of Regular Expression Matching Based on Backtracking

Satoshi Sugiyama, Yasuhiko Minamide
2014 IPSJ Online Transactions  
Most implementations of regular expression matching in programming languages are based on backtracking. With this implementation strategy, matching may not be achieved in linear time with respect to the length of the input. In the worst case, it may take exponential time. In this paper, we propose a method of checking whether or not regular expression matching runs in linear time. We construct a top-down tree transducer with regular lookahead that translates the input string into a tree
more » ... nding to the execution steps of matching based on backtracking. The regular expression matching then runs in linear time if the tree transducer is of linear size increase. To check this property of the tree transducer, we apply a result of Engelfriet and Maneth. We implemented the method in OCaml and conducted experiments that checked the time linearity of regular expressions appearing in several popular PHP programs. Our implementation showed that 47 of 393 regular expressions were not linear.
doi:10.2197/ipsjtrans.7.82 fatcat:e55skq5uerempaioci36ns6owa