Filters








597 Hits in 2.9 sec

POSIX Regular Expression Parsing with Derivatives [chapter]

Martin Sulzmann, Kenny Zhuo Ming Lu
2014 Lecture Notes in Computer Science  
We show how to obtain a POSIX algorithm for the general parsing problem based on Brzozowski's regular expression derivatives.  ...  We adapt the POSIX policy to the setting of regular expression parsing. POSIX favors longest left-most parse trees.  ...  The idea is to incrementally annotate regular expressions with partial parse tree information during the derivative step. Annotated regular expressions ri are defined in Figure 5 .  ... 
doi:10.1007/978-3-319-07151-0_13 fatcat:vq6gvcdenncnxbn453vz22od7u

Derivative-Based Diagnosis of Regular Expression Ambiguity [article]

Martin Sulzmann, Kenny Zhuo Ming Lu
2016 arXiv   pre-print
Regular expressions are often ambiguous. We present a novel method based on Brzozowski's derivatives to aid the user in diagnosing ambiguous regular expressions.  ...  We introduce a derivative-based finite state transducer to generate parse trees and minimal counter-examples.  ...  Computing Parse Trees via Derivatives Derivatives denote left quotients and they can be computed via a simple syntactic transformation. Definition 6 (Regular Expression Derivatives).  ... 
arXiv:1604.06644v2 fatcat:neorj6cyzrbsnouzau7dvwrqfe

The Design of a Verified Derivative-Based Parsing Tool for Regular Expressions

Elton Cardoso, Maycon Amaro, Samuel Feitosa, Leonardo Reis, André Du Bois, Rodrigo Ribeiro
2021 CLEI Electronic Journal  
We describe the formalization of Brzozowski and Antimirov derivative based algorithms for regular expression parsing, in the dependently typed language Agda.  ...  A tool for regular expression based search in the style of the well known GNU grep has been developed with the certified algorithms. Practical experiments conducted with this tool are reported.  ...  Regular Expressions Regular expressions are defined with respect to a given alphabet.  ... 
doi:10.19153/cleiej.24.3.2 fatcat:ebvo2a2wo5gezadk55tieuxi5i

Certified Derivative-Based Parsing of Regular Expressions [chapter]

Raul Lopes, Rodrigo Ribeiro, Carlos Camarão
2016 Lecture Notes in Computer Science  
We describe the formalization of a certified algorithm for regular expression parsing based on Brzozowski derivatives, in the dependently typed language Idris.  ...  A tool for regular expression based search in the style of the well known GNU grep has been developed with the certified algorithm, and practical experiments were conducted with this tool.  ...  Regular Expressions Regular expressions are defined with respect to a given alphabet.  ... 
doi:10.1007/978-3-319-45279-1_7 fatcat:lzvpux3jcvc6tpfu2kvazcea7u

Disambiguation in Regular Expression Matching via Position Automata with Augmented Transitions [chapter]

Satoshi Okui, Taro Suzuki
2011 Lecture Notes in Computer Science  
Disambiguation in Regular Expression Matching via Position Automata with Augmented Transitions regular expression, automata, pattern matching This paper offers a new efficient regular expression matching  ...  Abstract This paper offers a new efficient regular expression matching algorithm which follows the leftmost-longest rule specified in POSIX 1003.2 standard.  ...  We say a parse configuration ⟨u, t, v⟩ derives a string uwv if the canonical parse tree t derives the string w.  ... 
doi:10.1007/978-3-642-18098-9_25 fatcat:qa7ieweuabc2zpghzqyyt4umju

RE2C: A lexer generator based on lookahead-TDFA

U. Trofimovich
2020 Software Impacts  
RE2C is a regular expression compiler: it transforms regular expressions into finite state machines and encodes them as programs in the target language.  ...  J o u r n a l P r e -p r o o f Journal Pre-proof J o u r n a l P r e -p r o o f Journal Pre-proof [5] Angelo Borsotti, Ulya Trofimovich, Efficient POSIX Submatch Extraction on NFA, preprint, 2019, URL:  ...  Submatch extraction is a special case of the parsing problem: in addition to solving the recognition problem it has to find the derivation of the input string in the grammar defined by the regular expression  ... 
doi:10.1016/j.simpa.2020.100027 fatcat:nneqzhzmy5dzfb2u2cfyv53grm

Declarative Cleaning of Inconsistencies in Information Extraction

Ronald Fagin, Benny Kimelfeld, Frederick Reiss, Stijn Vansummeren
2016 ACM Transactions on Database Systems  
We show that our framework captures the popular cleaning policies, as well as the POSIX semantics for extraction through regular expressions.  ...  ., always results in a single repair) and whether it increases the expressive power of the extraction language.  ...  An example of a spanner representation is a regex formula: a regular expression with embedded capture variables that are viewed as relational attributes.  ... 
doi:10.1145/2877202 fatcat:rkgfhc7nkjcg7gs7jfau3mngka

Cleaning inconsistencies in information extraction via prioritized repairs

Ronald Fagin, Benny Kimelfeld, Frederick Reiss, Stijn Vansummeren
2014 Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '14  
We show that our framework captures the popular cleaning policies, as well as the POSIX semantics for extraction through regular expressions.  ...  Dealing with inconsistencies is hence of crucial importance in IE systems.  ...  A regular expression with capture variables, or just variable regex for short, is an expression in the following syntax that extends that of regular expressions: γ : r | | σ | γ γ | γ ¤ γ | γ ¦ | xtγu  ... 
doi:10.1145/2594538.2594540 dblp:conf/pods/FaginKRV14 fatcat:hslqgvt4yfcile4h7wtruoz6lq

Type inference for unique pattern matching

Stijn Vansummeren
2006 ACM Transactions on Programming Languages and Systems  
Regular expression patterns provide a natural, declarative way to express constraints on semistructured data and to extract relevant information from it.  ...  Since regular expressions can be ambiguous in general, different disambiguation policies have been proposed to get a unique matching strategy.  ...  Syntactically, regular (hedge) expression patterns are regular (hedge) expressions annotated with variable binders.  ... 
doi:10.1145/1133651.1133652 fatcat:jbhrv3ffyndg5hl6krq4wdrko4

Stream Processing using Grammars and Regular Expressions [article]

Ulrik Terp Rasmussen
2017 arXiv   pre-print
In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation.  ...  In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex.  ...  INTRODUCTION Regular Expression Based Parsing The first part will be concerned with regular expressions, an algebraic formalism with a well-understood theory that is commonly used to express patterns  ... 
arXiv:1704.08820v1 fatcat:twr3ysgyzzg3nap22n6xvepp5i

A Relational Framework for Information Extraction

Ronald Fagin, Benny Kimelfeld, Frederick Reiss, Stijn Vansummeren
2016 SIGMOD record  
This task is pervasive in contemporary computational challenges associated with Big Data.  ...  Regular expressions that have this feature are called extended regular expressions (xregex for short) [1, 8, 9] . It is known that xregex can recognize non-regular languages, such as tss | s P Σ˚u.  ...  Various languages for querying semi-structured and graph databases are based on regular expressions. A simple form of such queries are the regular path queries  ... 
doi:10.1145/2935694.2935696 fatcat:yxhivm3s4bennonn6zfmhyin5a

Tagged Deterministic Finite Automata with Lookahead [article]

Ulya Trofimovich
2019 arXiv   pre-print
This paper extends the work of Laurikari and Kuklewicz on tagged deterministic finite automata (TDFA) in the context of submatch extraction in regular expressions.  ...  The proposed algorithm can handle repeated submatch and therefore is applicable to full parsing.  ...  One useful extension of traditional regular expressions that cannot be implemented using ordinary DFA is submatch extraction and parsing.  ... 
arXiv:1907.08837v1 fatcat:gnlgj5jx3vhkfguyzrgkg2gc6q

Regular expression containment

Fritz Henglein, Lasse Nielsen
2011 Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '11  
Our axiomatization gives rise to a natural computational interpretation of regular expressions as simple types that represent parse trees, and of containment proofs as coercions.  ...  We show how to encode regular expression equivalence proofs in Salomaa's, Kozen's and Grabmayer's axiomatizations into our containment system, which equips their axiomatizations with a computational interpretation  ...  the applications of regular expressions as types.  ... 
doi:10.1145/1926385.1926429 dblp:conf/popl/HengleinN11 fatcat:u3xtunpeavap5mbkd5whm3umva

Analysing installation scenarios of Debian packages [chapter]

Benedikt Becker, Nicolas Jeannerod, Claude Marché, Yann Régis-Gianas, Mihaela Sighireanu, Ralf Treinen
2020 Lecture Notes in Computer Science  
The Debian distribution includes more than 28 thousand maintainer scripts, almost all of them are written in Posix shell.  ...  These scripts are executed with root privileges at installation, update, and removal of a package, which make them critical for system maintenance.  ...  All of those scripts that are syntactically correct with respect to the Posix standard (99.9%) are parsed successfully by our parser.  ... 
doi:10.1007/978-3-030-45237-7_14 fatcat:43ybsru5lnc6plmy5oflpqitaa

Transducers from Rewrite Rules with Backreferences [article]

Dale Gerdemann, Gertjan van Noord
1999 arXiv   pre-print
The range of possibilities leaves plenty of room for future research. lists the relevant regular expression operators. FSA Utilities offers the possibility to define new regular expression operators.  ...  To compare this with a backreference in Perl, suppose that Tacr is a subroutine that converts phrases into acronyms and that Racr is a regular expression matching phrases that can be converted into acronyms  ... 
arXiv:cs/9904008v1 fatcat:igrpk7ij25b2dndcsc3ny2kcke
« Previous Showing results 1 — 15 out of 597 results