Convergence and loss bounds for bayesian sequence prediction

M. Hutter
2003 IEEE Transactions on Information Theory  
The probability of observing x_t at time t, given past observations x_1...x_t-1 can be computed with Bayes' rule if the true generating distribution μ of the sequences x_1x_2x_3... is known. If μ is unknown, but known to belong to a class M one can base ones prediction on the Bayes mix ξ defined as a weighted sum of distributions ν∈ M. Various convergence results of the mixture posterior ξ_t to the true posterior μ_t are presented. In particular a new (elementary) derivation of the convergence
more » ... _t/μ_t→ 1 is provided, which additionally gives the rate of convergence. A general sequence predictor is allowed to choose an action y_t based on x_1...x_t-1 and receives loss ℓ_x_t y_t if x_t is the next symbol of the sequence. No assumptions are made on the structure of ℓ (apart from being bounded) and M. The Bayes-optimal prediction scheme Λ_ξ based on mixture ξ and the Bayes-optimal informed prediction scheme Λ_μ are defined and the total loss L_ξ of Λ_ξ is bounded in terms of the total loss L_μ of Λ_μ. It is shown that L_ξ is bounded for bounded L_μ and L_ξ/L_μ→ 1 for L_μ→∞. Convergence of the instantaneous losses are also proven.
doi:10.1109/tit.2003.814488 fatcat:efwt2gyn25c3rj746gtquajjh4