The Internet Archive has a preservation copy of this work in our general collections.
The file type is application/pdf
.
Streaming and Sublinear Approximation of Entropy and Information Distances
[article]
2005
arXiv
pre-print
In many problems in data mining and machine learning, data items that need to be clustered or classified are not points in a high-dimensional space, but are distributions (points on a high dimensional simplex). For distributions, natural measures of distance are not the ℓ_p norms and variants, but information-theoretic measures like the Kullback-Leibler distance, the Hellinger distance, and others. Efficient estimation of these distances is a key component in algorithms for manipulating
arXiv:cs/0508122v2
fatcat:6t3pruhej5h5deybzpl5xqf6nq