Nearly tight sample complexity bounds for learning mixtures of Gaussians via sample compression schemes

Hassan Ashtiani, Shai Ben-David, Nicholas J. A. Harvey, Christopher Liaw, Abbas Mehrabian, Yaniv Plan
2018 Neural Information Processing Systems  
We prove that Θ(kd 2 /ε 2 ) samples are necessary and sufficient for learning a mixture of k Gaussians in R d , up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that O(kd/ε 2 ) samples suffice, matching a known lower bound. The upper bound is based on a novel technique for distribution learning based on a notion of sample compression. Any class of distributions that allows such
more » ... a sample compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in R d has a small-sized sample compression.
dblp:conf/nips/AshtianiBHLMP18 fatcat:vklzdgu4ord2jkgudks6ky5xlm