Extending regular expressions with context operators and parse extraction

Steven M. Kearns
1991 Software, Practice & Experience  
Regular expressions are used in many applications to specify patterns because any regular expression can be compiled into a very efficient one-pass pattern matcher called a finite automaton. Finding matches is useful, but even more useful is parse extraction, which describes in detail how a pattern matches some input. After matching an address, for example, parse extraction makes it easy to find out the Zip code part of the address. We present an elegant, efficient algorithm for extracting a
more » ... se after matching with a finite automaton. In addition, we extend the regular expression language to include new operators for matching arbitrary left context and single character right context. The extended language can be matched as efficiently as the usual regular expression language, but is more expressive. Finally, we suggest how to apply the matching algorithms to match regular expressions containing arbitrary right context and single character left context. In effect, this allows one to specify patterns that seem to require an unlimited amount of look-ahead to match. KEY WORDS Regular expressions Context sensitive String matching Parse extraction Motivating parse extraction The fastest algorithms for matching a regular expression first compile a regular expression into a deterministic finite automaton (DFA) and then use the DFA to find the starting and/or ending positions of a match in an input string. This is fine
doi:10.1002/spe.4380210803 fatcat:tbht73rkajgltn47zkqddg4vx4