On Context-Tree Prediction of Individual Sequences
IEEE Transactions on Information Theory
Motivated by the evident success of context-tree based methods in lossless data compression, we explore, in this paper, methods of the same spirit in universal prediction of individual sequences. By context-tree prediction, we refer to a family of prediction schemes, where at each time instant t, after having observed all outcomes of the data sequence x 1 , . . . , x t−1 , but not yet x t , the prediction is based on a "context" (or a state) that consists of the k most recent past outcomes x
... past outcomes x t−k , . . . , x t−1 , where the choice of k may depend on the contents of a possibly longer, though limited, portion of the observed past, x t−kmax , . . . , x t−1 . This is different from the study reported in , where general finite-state predictors as well as "Markov" (finite-memory) predictors of fixed order, where studied in the regime of individual sequences. Another important difference between this study and  is the asymptotic regime. While in , the resources of the predictor (i.e., the number of states or the memory size) were kept fixed regardless of the length N of the data sequence, here we investigate situations where the number of contexts, or states, is allowed to grow concurrently with N . We are primarily interested in the following fundamental question: What is the critical growth rate of the number of contexts, below which the performance of the best context-tree predictor is still universally achievable, but above which it is not? We show that this critical growth rate is linear in N . In particular, we propose a universal context-tree algorithm that essentially achieves optimum performance as long as the growth rate is sublinear, and show that, on the other hand, this is impossible in the linear case.