Splicing systems and the Chomsky hierarchy

Jean Berstel, Luc Boasson, Isabelle Fagnot
2012 Theoretical Computer Science  
In this paper, we prove decidability properties and new results on the position of the family of languages generated by (circular) splicing systems within the Chomsky hierarchy. The two main results of the paper are the following. First, we show that it is decidable, given a circular splicing language and a regular language, whether they are equal. Second, we prove the language generated by an alphabetic splicing system is context-free. Alphabetic splicing systems are a generalization of simple
more » ... and semi-simple splicing systems already considered in the literature. A flat splicing system, or a splicing system for short, is a triplet S = (A, I, R), where A is an alphabet, I is a set of words over A, called the initial set and R is a finite set of splicing rules, which are quadruplets ⟨α|γ −δ|β⟩ of words over A. The words α, β, γ and δ are called the handles of the rule. Let r = ⟨α|γ −δ|β⟩ be a splicing rule. Given two words u = xα · βy and v = γ zδ, applying r to the pair (u, v) yields the word w = xα · γ zδ · βy. (The dots are used only to mark the places of cutting and pasting, they are not parts of the words.) This operation is denoted by u, v ⊢ r w and is called a production. Note that the first word (here u) is always the one in which the second word (here v) is inserted. Example 2.1. 1. Consider the splicing rule r = ⟨ab|aa−b|c⟩. We have the production bab · cc, aaccb ⊢ r bab · aaccb · cc. 2. Consider the splicing rule ⟨b|a−a|b⟩. Note that we cannot produce the word b ·a·b from the word b ·b and the singleton a, because the rule requires that the inserted word has at least two letters. On the contrary, the rule ⟨b|ε−a|b⟩ does produce the word bab from the words bb and a. 3. For the rule r = ⟨ε|a−a|b⟩, the production ·bbc, aba ⊢ r aba · bbc, is in fact a concatenation. 4. As a final example, the rule ⟨ε|ε−ε|ε⟩ permits all insertions (including concatenations) of a word into another one. The language generated by the flat splicing system S = (A, I, R), denoted F (S), is the smallest language L containing I and closed by R, i.e., such that for any couple of words u and v in L and any rule r in R, then any word such that u, v ⊢ r w is also in L. Example 2.2. Consider the splicing system over A = {a, b} with initial set I = {ab} and the unique splicing rule r = ⟨a|a−b|b⟩. It generates the context-free and non-regular language F (S) = {a n b n | n ≥ 1}. (This example is a straightforward adaptation of Example 2.3 below.) A splicing system is finite (resp. regular, context-free, context-sensitive) if its initial set is finite (resp. regular, context-free, context-sensitive). A rule r = ⟨α|γ −δ|β⟩ is alphabetic if its four handles α, β, γ and δ are letters or the empty word. A splicing system is alphabetic if all its rules are alphabetic. Splicing and insertion systems are incomparable Insertion systems, as described in [16] , look quite similar to flat splicing systems. This justifies to compare both generating devices.
doi:10.1016/j.tcs.2012.03.008 fatcat:llyu7onjxna5xbfiop6u2ff6na