Universal Filtering Via Prediction

T. Weissman, E. Ordentlich, M.J. Weinberger, A. Somekh-Baruch, N. Merhav
2007 IEEE Transactions on Information Theory  
We consider the filtering problem, where a finite-alphabet individual sequence is corrupted by a discrete memoryless channel, and the goal is to causally estimate each sequence component based on the past and present noisy observations. We establish a correspondence between the filtering problem and the problem of prediction of individual sequences which leads to the following result: Given an arbitrary finite set of filters, there exists a filter which performs, with high probability,
more » ... obability, essentially as well as the best in the set, regardless of the underlying noiseless individual sequence. We use this relationship between the problems to derive a filter guaranteed of attaining the "finite-state filterability" of any individual sequence by leveraging results from the prediction problem. 1 this line of work was focused on the case where estimation of the components of the noise-corrupted individual sequence needs to be done causally, which was labelled 'the sequential compound decision problem'. Early work on the compound sequential decision problem concentrated on competing with the class of time-invariant "symbol by symbol" estimation rules. Later, references [1, 2, 30, 31] extended the scope to reference classes of "Markov" estimators of a fixed and known order. Unlike the prediction problem, however, this problem seems to have largely escaped the spotlight in the recent resurgence of interest in sequential decision problems. An exception is the work in [3, 4] on filtering a Discrete Memoryless Channel (DMC)-corrupted individual sequence with respect to filters implementable as finite-state machines. Another exception is the part of the work in [35] that deals with limited-delay coding of a noise-corrupted individual sequence. 1 The closely related problem of prediction for noisecorrupted individual sequences was considered in [34, 36] . In compliance with more modern terminology, used e.g. in the literature on hidden Markov models [16], we henceforth use the term 'filtering' in lieu of 'compound sequential decision problem' in referring to the problem of causally estimating the components of a noise-corrupted individual sequence. Our goal in this work is to establish a close relationship between the problem of predicting an individual sequence, and that of filtering a DMC-corrupted sequence. We show that with any filter one can associate a predictor for the noisy sequence, whose observable prediction loss (under the right prediction space and loss function) efficiently estimates that of the original filter (which depends also on the noiseless sequence and hence is not observable). This association allows us to transfer results on prediction relative to a set of experts to analogous results for the filtering problem: Given a set of filters, one constructs a predictor competing with the associated set of predictors, using existing theory on universal prediction. The filter associated with such a competing predictor can then be shown to successfully compete with the original set of filters. In other words, this approach yields a filter performing, with high probability, at least as well as the best in a given class of filters, regardless of the underlying noise-free individual sequence. An approach similar in spirit to the one we follow here was taken in [34] for the problem of predicting a noise-corrupted individual sequence. There too, the idea was to transform the problem to one of prediction in the noiseless sense of the noisy sequence, under a modified loss function. The prediction space, however, remained that of the original problem. In contrast, in our filtering setting, the prediction space in the associated prediction problem will be a space of mappings from a noisy to a reconstruction symbol. Note that the idea of introducing a modified loss function (or distortion measure) to reduce a problem involving noise to a more familiar and basic noiseless one is used in other contexts as well. For example, rate distortion coding of noisy sources is readily reduced to the classical
doi:10.1109/tit.2007.892782 fatcat:jw66ym3g4nawpc5cnebnrhctz4