The Lempel–Ziv Complexity of Fixed Points of Morphisms

Sorin Constantinescu, Lucian Ilie
2007 SIAM Journal on Discrete Mathematics  
The Lempel-Ziv complexity is a fundamental measure of complexity for words, closely connected with the famous LZ77 compression algorithm. We investigate this complexity measure for one of the most important families of infinite words in combinatorics, namely the fixed points of morphisms. We give a complete characterization of the complexity classes which are Θ(1), Θ(log n), and Θ(n 1/k ), k ∈ N, k ≥ 2, depending on the periodicity of the word and the growth function of the morphism. The
more » ... orphism. The relation with the well-known classification of Ehrenfeucht, Lee, Rozenberg, and Pansiot for factor complexity classes is also investigated. The two measures complete each other, giving an improved picture for the complexity of these infinite words. 1. Introduction. Before publishing their famous papers introducing the wellknown compression schemes LZ77 and LZ78 in [36] and [37], resp., Lempel and Ziv introduced a complexity measure for words in [21] which attempted to detect "sufficiently random looking" sequences. In contrast with the fundamental measures of Kolmogorov [19] and Chaitin [4], Lempel and Ziv's measure is computable. The definition is purely combinatorial; its basic idea, splitting the word into minimal never seen before factors, proved to be at the core of the well-known compression algorithm LZ77, as well as subsequent variations. Another, closely related variant is to decompose the word into maximal already seen factors, as introduced by Crochemore [7] as a tool for algorithm design. Lempel-Ziv-type complexity and factorizations have important applications in many areas, such as data compression [36, 37] , string algorithms [7, 20, 25, 32] , cryptography [26], molecular biology [5, 15, 16] , and neural computing [1, 34, 35] . Lempel and Ziv [21] investigate various properties which are expected from a complexity measure which intends to detect randomness. They prove it to be subadditive and also that most (but not too many) sequences are complex; see [21] for details. Also, they test it against de Bruijn words, [3], as a well-established case of complex words -de Bruijn words contain as factors all words of a given length, within the minimum possible space. Therefore, they establish the first connection with the factor complexity, which is also one of our topics. In this paper, we investigate the Lempel-Ziv complexity from the combinatorial point of view and not from an information theoretical perspective. Nevertheless, some implications of our results to data compression are obtained. We shall consider the Lempel-Ziv complexity for one of the most important classes of infinite words in combinatorics, namely the fixed points of morphisms. Many famous infinite words, such as Fibonacci or Thue-Morse, belong to this family; see, e.g., [23] . The fundamental nature of this measure allows for a complete characterization of the complexity of infinite fixed points of morphisms. The lowest complexity, constant,
doi:10.1137/050646846 fatcat:xduhpq7sfzer5hmu4gwlvpnavi