Non-size increasing graph rewriting for natural language processing
Mathematical Structures in Computer Science
A very large amount of work in Natural Language Processing (NLP) use tree structure as the first class citizen mathematical structures to represent linguistic structures, such as parsed sentences or feature structures. However, some linguistic phenomena do not cope properly with trees; for instance, in the sentence 'Max decides to leave,' 'Max' is the subject of the both predicates 'to_decide' and 'to_leave'. Tree-based linguistic formalisms generally use some encoding to manage sentences like
... age sentences like the previous example. In former papers (Bonfante et al. 2011; Guillaume and Perrier 2012), we discussed the interest to use graphs rather than trees to deal with linguistic structures, and we have shown how Graph Rewriting could be used for their processing, for instance in the transformation of the sentence syntax into its semantics. Our experiments have shown that Graph Rewriting applications to NLP do not require the full computational power of the general Graph Rewriting setting. The most important observation is that all graph vertices in the final structures are in some sense 'predictable' from the input data, and so we can consider the framework of Non-size increasing Graph Rewriting. In our previous papers, we have formally described the Graph Rewriting calculus we used and our purpose here is to study the theoretical aspect of termination with respect to this calculus. Given that termination is undecidable in general, we define termination criterions based on weight, we prove the termination of weighted rewriting systems, and we give complexity bounds on derivation lengths for these rewriting systems.