Determining Sets of Quasiperiods of Infinite Words

Guilhem Gamard, Gwenaël Richomme, Marc Herbstritt
2016 International Symposium on Mathematical Foundations of Computer Science  
A word is quasiperiodic if it can be obtained by concatenations and overlaps of a smaller word, called a quasiperiod. Based on links between quasiperiods, right special factors and square factors, we introduce a method to determine the set of quasiperiods of a given right infinite word. Then we study the structure of the sets of quasiperiods of right infinite words and, using our method, we provide examples of right infinite words with extremal sets of quasiperiods (no quasiperiod is
more » ... ic, all quasiperiods except one are quasiperiodic, . . . ). Our method is also used to provide a short proof of a recent characterization of quasiperiods of the Fibonacci word. Finally we extend this result to a new characterization of standard Sturmian words using a property of their sets of quasiperiods. set of quasiperiods of any infinite word. Next we describe several examples of uses of this method. In [6] , Christou, Crochemore and Iliopoulos provide characterizations of quasiperiods of Fibonacci strings. One of their motivations was that "Fibonacci strings are important in many concepts [3] and are often cited as a worst case example for many string algorithms." However, Fibonacci strings are not always the best words for this purpose. For example, Groult and Richomme [11] proved that the algorithms provided by Brodal and Pedersen [5] and by Iliopoulos and Mouchard [12] to compute all the quasiperiods of a word do not reach their worst case on Fibonacci strings. They proved that those algorithms were optimal, and provided a family of strings reaching the worst case. Nevertheless, the study of finite Fibonacci strings is indeed of great interest: see, e.g., references in [6] . Some of the results from [6] were recently reformulated by Mousavi, Schaeffer and Shallit as a new characterization of quasiperiods of the infinite Fibonacci word [21] (another one was given in [14] ). They use this result, among many others, to show how to build automated proofs of some results about the Fibonacci word. Using the method to determine the set of quasiperiods of any infinite word described in Section 2, we provide a short proof of the above mentioned characterization (Section 4.1). The infinite Fibonacci word is a special case of Sturmian words (and therefore Fibonacci strings are special cases of factors of Sturmian words). A natural question is whether the previous characterization of quasiperiods of the Fibonacci word can be extended to other Sturmian words. Unfortunately, some Sturmian words are not quasiperiodic [15] ; more precisely, a Sturmian word is quasiperiodic if and only if it is not a Lyndon word. However, we can still extend our characterization to standard Sturmian words, i.e. Sturmian words having all their left special factors as prefixes (Section 4.3). Sturmian words are not necessarily quasiperiodic, but their bi-infinite counterparts are always multi-scale quasiperiodic. This result can be extended to subshifts, i.e. topological spaces generated from languages by the shift operation. A subshift is quasiperiodic (resp. multi-scale quasiperiodic) if and only if it is generated by a word which is quasiperiodic (resp. multi-scale quasiperiodic). Monteil and Marcus proved [19] that all Sturmian subshifts are multi-scale quasiperiodic. They also proved that multi-scale quasiperiodic shifts have zero topological entropy, are minimal (their words are uniformly recurrent), and that all of their factors have frequencies (see [9] for a generalization to two-dimensional words). The main tool of [19] is a so-called derivation operation, which takes the inverse image of a word by a well-chosen morphism. The derivative of a quasiperiodic word w is another word which describes the lengths of the overlaps in w between each two consecutive occurrences of its quasiperiods. While reading [19] , one naturally asks whether the derivation operation preserves multi-scale quasiperiodicity. In other terms, given a multi-scale quasiperiodic word, does its derivative still has infinitely many quasiperiods? In Section 3, we show this is not the case. We provide a right infinite multi-scale quasiperiodic word whose derivative is non-quasiperiodic. While discussing properties of the derivation operation, we also provide a word such that each quasiperiod has the previous (in terms of length) one as a quasiperiod. This nested effect can be avoided; we provide a multi-scale quasiperiodic word with only non-quasiperiodic quasiperiods. The proof of properties of our examples all involve our general method. Let us summarize the main parts of our paper. In Section 2, we present general properties of quasiperiods of right infinite words and a general method to determine them. In Section 3, we present our results around the derivation operation. In Section 4, we provide our proof of the characterization of the quasiperiods of the Fibonacci word and its generalization to the new characterization of standard Sturmian words. A multiscale quasiperiodic word with all quasiperiods superprimitive Let q = abbababba and consider morphism ψ defined by: Proposition 3.5. The quasiperiods of the infinite word ψ ω (a) are the words ψ n (q) with n ≥ 0. Moreover each of these quasiperiods is superprimitive. This proposition is a synthesis of the next three lemmas. Lemma 3.6. The word ψ ω (a) is ψ n (q)-quasiperiodic for each n ≥ 0. Proof. As already recalled in the proof of Proposition 3.3, for any non-erasing morphism f and any infinite word w, if w is q-quasiperiodic then f (w) is f (q)-quasiperiodic. Hence to prove the lemma, we just need to prove that ψ ω (a) is q-quasiperiodic. As both words obtained from ψ(a) and abψ(b) removing their last b are q-quasiperiodic, for any infinite word w, ψ(aw) is q-quasiperiodic. In particular ψ ω (a) is q-quasiperiodic. M F C S 2 0 1 6 40:8 Determining Sets of Quasiperiods of Infinite Words Lemma 3.7. For any n ≥ 0, the word ψ n (q) is superprimitive. Proof. Assume by contradiction that n is the least integer such that ψ n (q) is quasiperiodic, and let Q be one of its quasiperiods. Necessarily n ≥ 1. The word ψ(a)ba is a prefix of ψ n (q). An exhaustive verification shows that a prefix of ψ(a)ba is a border of ψ n (q) if and only if this prefix is of the form (abbab) with ∈ [1; 7] when n = 1 and ∈ {1, 2} when n ≥ 2. As any abbab-quasiperiodic word cannot contain the word aa as a factor, no prefix of ψ(a)ba can be a quasiperiod of ψ n (q). It follows that ψ(a)ba must be a prefix of Q. Observe that if ψ(a)ba is a factor of the image by ψ of a word (finite or infinite) u, then any occurrence of ψ(a)ba in ψ(u) corresponds to a prefix of the image of a suffix of u. Consequently, considering the last occurrence of Q in ψ n (q), we then deduce that Q = ψ(Q ) for some word Q . That Q is a quasiperiod of ψ n (q) means there exists a double sequence of words (p i , s i ) 1≤i≤k such that ψ n (q) = p i Qs i for each i in [1; k], p 1 = ε = s k and, for each i in [1; k − 1], |p i Q| ≥ |p i+1 | > |p i |. The observation at the beginning of the paragraph implies that, for each i in [1; k], p i = ψ(p i ) for some word p i . As Q = ψ(Q ) and as images of letters by ψ have all the same length, for each i in [1; k] s i = ψ(s i ) for some word s i . Injectivity of ψ implies that for each i in [1; k], ψ n−1 (q) = p i Q s i . Moreover p 1 = ε = s k . Observe for each i in [1; k], |p i | = 35|p i | and |q| = 35|Q |. Hence for each i in [1; k − 1] |p i Q| ≥ |p i+1 | > |p i |. Hence ψ n−1 (a) is Q -quasiperiodic. This contradicts the minimality in the choice of n. Lemma 3.8. If Q is a quasiperiod of ψ ω (a) then Q = ψ n (q) for some integer n ≥ 0.
doi:10.4230/lipics.mfcs.2016.40 dblp:conf/mfcs/GamardR16 fatcat:viporj3zyfg7fisnhywyzedmfy