Budgeted Nonparametric Learning from Data Streams

Ryan Gomes, Andreas Krause
2010 International Conference on Machine Learning  
We consider the problem of extracting informative exemplars from a data stream. Examples of this problem include exemplarbased clustering and nonparametric inference such as Gaussian process regression on massive data sets. We show that these problems require maximization of a submodular function that captures the informativeness of a set of exemplars, over a data stream. We develop an efficient algorithm, Stream-Greedy, which is guaranteed to obtain a constant fraction of the value achieved by
more » ... the optimal solution to this NP-hard optimization problem. We extensively evaluate our algorithm on large real-world data sets.
dblp:conf/icml/GomesK10 fatcat:f5dxjrstkvhvzj67n37b6n4fzy