Learning mixtures of separated nonspherical Gaussians

Sanjeev Arora, Ravi Kannan
2005 The Annals of Applied Probability  
Mixtures of Gaussian (or normal) distributions arise in a variety of application areas. Many heuristics have been proposed for the task of finding the component Gaussians given samples from the mixture, such as the EM algorithm, a local-search heuristic from Dempster, Laird and Rubin [J. Roy. Statist. Soc. Ser. B 39 (1977) 1-38]. These do not provably run in polynomial time. We present the first algorithm that provably learns the component Gaussians in time that is polynomial in the dimension.
more » ... in the dimension. The Gaussians may have arbitrary shape, but they must satisfy a "separation condition" which places a lower bound on the distance between the centers of any two component Gaussians. The mathematical results at the heart of our proof are "distance concentration" results--proved using isoperimetric inequalities--which establish bounds on the probability distribution of the distance between a pair of points generated according to the mixture. We also formalize the more general problem of max-likelihood fit of a Gaussian mixture to unstructured data.
doi:10.1214/105051604000000512 fatcat:t726jb7skvbnfpxamqxspn5t2e