Clustered subset selection and its applications on it service metrics

Christos Boutsidis, Jimeng Sun, Nikos Anerousis
2008 Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08  
Motivated by the enormous amounts of data collected in a large IT service provider organization, this paper presents a method for quickly and automatically summarizing and extracting meaningful insights from the data. Termed Clustered Subset Selection (CSS), our method enables programguided data explorations of high-dimensional data matrices. CSS combines clustering and subset selection into a coherent and intuitive method for data analysis. In addition to a general framework, we introduce a
more » ... , we introduce a family of CSS algorithms with different clustering components such as k-means and Close-to-Rank-One (CRO) clustering, and Subset Selection components such as best rank-one approximation and Rank-Revealing QR (RRQR) decomposition. From an empirical perspective, we illustrate that CSS is achieving significant improvements over existing Subset Selection methods in terms of approximation errors. Compared to existing Subset Selection techniques, CSS is also able to provide additional insight about clusters and cluster representatives. Finally, we present a case-study of programguided data explorations using CSS on a large amount of IT service delivery data collection.
doi:10.1145/1458082.1458162 dblp:conf/cikm/BoutsidisSA08 fatcat:xrjesqsgv5azvai6yk3r64pxlm