Carlos Ordonez, Paul Cereghini
2000 SIGMOD record  
Clustering is one of the most important tasks performed in Data Mining applications. This paper presents an efficient SQL implementation of the EM algorithm to perform clustering in very large databases. Our version can effectively handle high dimensional data, a high number of clusters and more importantly, a very large number of data records. We present three strategies to implement EM in SQL: horizontal, vertical and a hybrid one. We expect this work to be useful for data mining programmers
more » ... mining programmers and users who want to cluster large data sets inside a relational DBMS.
doi:10.1145/335191.335468 fatcat:5tnsroizqfhxxls7y7b4jrss4q