A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Efficient estimation of the cardinality of large data sets
2006
Discrete Mathematics & Theoretical Computer Science
International audience Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.
doi:10.46298/dmtcs.3492
fatcat:s4hq46qhozg75hp7vqf6ncfv5i