Joint Cluster Analysis of Attribute Data and Relationship Data: the Connectedk-Center Problem [chapter]

Martin Ester, Rong Ge, Byron J. Gao, Zengjian Hu, Boaz Ben-Moshe
2006 Proceedings of the 2006 SIAM International Conference on Data Mining  
Attribute data and relationship data are two principle types of data, representing the intrinsic and extrinsic properties of entities. While attribute data has been the main source of data for cluster analysis, relationship data such as social networks or metabolic networks are becoming increasingly available. It is also common to observe both data types carry orthogonal information such as in market segmentation and community identification, which calls for a joint cluster analysis of both
more » ... types so as to achieve more accurate results. For this purpose, we introduce the novel Connected k-Center problem, taking into account attribute data as well as relationship data. We analyze the complexity of this problem and prove its NP-completeness. We also present a constant factor approximation algorithm, based on which we further design NetScan, a heuristic algorithm that is efficient for large, real databases. Our experimental evaluation demonstrates the meaningfulness and accuracy of the NetScan results.
doi:10.1137/1.9781611972764.22 dblp:conf/sdm/EsterGGHB06 fatcat:xfgwq6vdjzdjhc7vpgoldxskoe