OCMiner: A density-based overlapping community detection method for social networks

Sajid Yousuf Bhat, Muhammad Abulais
2015 Intelligent Data Analysis  
Community detection is an important task for identifying the structure and function of complex networks. The task is challenging as communities often show overlapping and hierarchical behavior, i.e., a node can belong to multiple communities, and multiple smaller communities can be embedded within a larger community. Moreover, real-world networks often contain communities of arbitrary size and shape, along with outliers. This paper presents a novel density-based overlapping community detection
more » ... ethod, OCMiner, to identify overlapping community structures in social networks. Unlike other density-based community detection methods, OCMiner does not require the neighborhood threshold parameter (ε) to be set by the users. Determining an optimal value for ε is a longstanding and challenging task for density-based clustering methods. Instead, OCMiner automatically determines the neighborhood threshold parameter for each node locally from the underlying network. It also uses a novel distance function which utilizes the weights of the edges in weighted networks, besides being able to find communities in un-weighted networks. The efficacy of the proposed method has been established through experiments on various real-world and synthetic networks. In comparison to the existing state-of-the-art community detection methods, OCMiner is computationally faster, scalable to large-scale networks, and able to find significant community structures in social networks. [18] along with modularity-based algorithms [8, 37] and likelihood-based algorithms [7] have been developed for community detection in social networks. Community detection in a network depends on various factors, including whether the definition of community relies on global or local network properties, whether nodes can simultaneously belong to several communities, whether link weights are utilized, whether outliers are considered, and whether community definition allows for hierarchical structure. The fact that nodes in a network can belong to more than one community, and a solution based on k-clique percolation given by Palla et al. [38] have resulted in an increased attention towards the problem of overlapping community detection in social networks. Although most of the methods consider overlap of communities at boundaries, some methods allow central vertices of communities to overlap, making the characterization of overlapping vertices unclear [15] . Here, We argue that a central vertex of a community can also be a boundary vertex of another community during an overlap in a real-world network. Besides overlapping communities, real-world social networks often show a hierarchical organization in their community structure. In such cases, multiple smaller communities at lower levels form a larger community at a higher level, or a community at lower level may be a part of even larger communities at higher levels. It thus becomes important to identify both overlapping community structures and their hierarchical organization from such networks to provide an appropriate representation of communities. Hierarchical clustering is a well-known technique used in social network analysis [52, 44] to naturally create a hierarchical tree of partitions, called dendrogram. However, such method does not consider overlaps and produces all possible partitions based on the similarity measure used, without stressing on the quality of identified community structures. Recently, a class of community detection methods [27, 41] , called multi-resolution method, has started to evolve with a general property of having a tunable parameter to adjust the characteristic size of communities to be detected. Varying the value of resolution parameter enables such methods to detect community structures at varying levels of resolutions and thus form a hierarchical organization of community structures for a network. Considering the case of OSNs like Facebook, and Twitter, community structures have mostly been analyzed using traditional community detection techniques over un-weighted social graphs representing explicit relations (friends, colleagues, etc.) of users. However, in order to identify functional communities in OSNs, it is necessary to take users interaction data (posts, blogs, chats, comments, etc.) into consideration as well. Through these interactions users gradually form social groups/communities based on shared values and interests that are quite different from traditional communities formed on the basis of geographical locations [54]. Analyses of Wilson et al. [56] and Viswanath et al. [51] on Facebook friendship and interaction data reveal that most of the users interact only with a small subset of their declared social group. This highlights that only a subset of declared social group actually represents interactive relationships. Their results demonstrate that a large part of interactions for majority of the users occur only across a small subset (as low as 20%) of their declared social group (friends). On a 100% fraction line, it has been seen that nearly all users can attribute all of their interactions to only 60% of their friends, and for majority of the users all their interactions are reciprocated. Considering interaction degrees of the nodes in OSNs, likelihood of nodes to link to other nodes of similar degree is more than the friend network. This means that nodes in interaction network show more assortativity than the friend network, and places it close to known social networks. These findings suggest that social network based systems should be based on activity network, rather than on friend network. Activity network of OSNs can be treated as a weighted graph, and a community detection
doi:10.3233/ida-150751 fatcat:vpd4wt3ynrcylavyk4yfkpw25u