Analysis of Grasshopper, a Novel Social Network De-anonymization Algorithm

Benedek Simon, Gábor György Gulyás, Sándor Imre
2014 Periodica Polytechnica Electrical Engineering and Computer Science  
Social networks have an important and possibly key role in our society today. In addition to the benefits, serious privacy concerns also emerge: there are algorithms called de-anonymization attacks that are capable of re-identifying large fractions of anonymously published networks. A strong class of these attacks solely use the network structure to achieve their goals. In this paper we propose a novel structural de-anonymization attack called Grasshopper. By measurements we compare Grasshopper
more » ... to the state-of-the-art algorithm, and highlight its enhanced capabilities, such as having negligible error rates and accessing yield levels that was not possible before: in cases when there is greater noise in the background knowledge. We furthermore evaluate an anonymity measure for the Grasshopper algorithm which enables the approximate ranking of nodes according to their re-identification rates. Finally, we characterize the robustness of Grasshopper in tackling identity separation, a privacy-enhancing technique that facilitate hiding of structural information. Keywords social network · privacy · anonymity 1 Introduction Most of social networking services provide interfaces for managing social relationships, while others focus on enabling the collaboration of their users. A useful feature of these services is that they are supported by an underlying (and occasionally only implicitly existing) graph structure. However, beside the values these services give to humanity, social media also serves as an optimal platform for all kinds of surveillance activities, as members can snoop upon each other, commercial parties can access vast amounts of private data, and as recent events confirm [4], government surveillance is also present as well. Therefore it is crucial to investigate privacy issues beyond the use of related settings. In this paper we consider how the graph structure can be abused to violate user privacy. There are several ways to access anonymized datasets, for example, someone can obtain such a dataset that was previously released for business or research purposes. While such a dataset should contain private attributes without explicit identifiers, a malicious third party can try to re-identify nodes by using their relationships. In case of success, the private information could be used (and monetized) with real identities. The basic idea for performing this type of attack is to use structural data from another social network to execute an iterative re-identification algorithm. Despite the difficult nature of the problem, several attacks have been published recently that are able to breach user privacy at large-scale even in networks having hundreds of thousands of nodes [21] . Let us now illustrate how these attacks work on a simple example. An adversary obtains datasets as depicted on Fig. 1a (background knowledge) and Fig. 1b (sanitized dataset) , wishing to learn an otherwise inaccessible private attribute by structural de-annymization: who is a democrat or republican voter in the public network. Initially, the attacker re-identifies (or maps) ν Dave ↔ ν 3 and ν Fred ↔ ν 2 as they have globally the highest matching degree values in both networks. Then he continues with local re-identification by inspecting nodes related to the ones already re-identified. Therefore, he picks ν Ed , who is the highest degree common neighbor of (ν Dave , ν Fred ), and then it is mapped as ν Ed ↔ ν 7 , as ν 7 is the only node neighboring ν 2 ,ν 3 , and have a degree of 3. This simple algorithm can continue iterating through unmapped nodes, resulting in discovering further possible mappings (e.g., ν Harry ↔ ν 1 , ν Carol ↔ ν 6 ).
doi:10.3311/ppee.7878 fatcat:f56zkd37rrcu3ibhnxui4jgfb4