Predicting and Identifying Missing Node Information in Social Networks

Ron Eyal, Avi Rosenfeld, Sigal Sina, Sarit Kraus
2013 ACM Transactions on Knowledge Discovery from Data  
In recent years, social networks have surged in popularity. One key aspect of social network research is identifying important missing information which is not explicitly represented in the network, or is not visible to all. To date, this line of research typically focused on finding the connections that are missing between nodes, a challenge typically termed as the Link Prediction Problem. This paper introduces the Missing Node Identification problem where missing members in the social network
more » ... structure must be identified. In this problem, indications of missing nodes are assumed to exist. Given these indications and a partial network, we must assess which indications originate from the same missing node and determine the full network structure. Towards solving this problem, we present the MISC Algorithm (Missing node Identification by Spectral Clustering), an approach based on a spectral clustering algorithm, combined with nodes' pairwise affinity measures which were adopted from link prediction research. We evaluate the performance of our approach in different problem settings and scenarios, using real life data from Facebook. The results show that our approach has beneficial results and can be effective in solving the Missing Node Identification Problem. In addition, this paper also presents R-MISC which uses a sparse matrix representation, efficient algorithms for calculating the nodes' pairwise affinity and a proprietary dimension reduction technique, to enable scaling the MISC algorithm to large networks of more than 100,000 nodes. Last, we consider problem settings where some of the indications are unknown. Two algorithms are suggested for this problem -Speculative MISC, based on MISC, and Missing Link Completion, based on classical link prediction literature. We show that Speculative MISC outperforms Missing Link Completion.
doi:10.1145/2536775 fatcat:m35kzk7435enznarlyr7wgk7hy