Prediction by quantization of a conditional distribution

Jean-Michel Loubes, Bruno Pelletier
2017 Electronic Journal of Statistics  
Given a pair of random vectors (X, Y ), we consider the problem of approximating Y by c(X) = {c 1 (X), . . . , c M (X)} where c is a measurable set-valued function. We give meaning to the approximation by using the principles of vector quantization which leads to the definition of a multifunction regression problem. The formulated problem amounts at quantizing the conditional distributions of Y given X. We propose a nonparametric estimate of the solutions of the multifunction regression problem
more » ... regression problem by combining the method of M -means clustering with the nonparametric smoothing technique of k-nearest neighbors. We provide an asymptotic analysis of the estimate and we derive a convergence rate for the excess risk of the estimate. The proposed methodology is illustrated on simulated examples and on a speed-flow traffic data set emanating from the context of road traffic forecasting.
doi:10.1214/17-ejs1296 fatcat:wcbwk37yuvfrjh6o2jlsy7qavq