Email alias detection using social network analysis

Ralf Hölzer, Bradley Malin, Latanya Sweeney
2005 Proceedings of the 3rd international workshop on Link discovery - LinkKDD '05  
This research addresses the problem of correctly relating aliases that belong to the same entity. Previous approaches focused on natural language processing and structured data, whereas in this research we analyze the local association, or "social" network in which aliases reside. The network is constructed from email data mined from the Internet. Links in the network represent web pages on which two email addresses are collocated. The problem is defined as given social network S, constructed
more » ... om email address collocations, and an email address E, identify any aliases for E that also appear in S. The alias detection methods are evaluated on a data set of over 14,000 University X email addresses for which ground truth relations are known. The results are reported as partial lists of k choices for possible aliases, ranked by predicted relational strength within the network. Given a source email address, a portion of all email addresses, 2%, are correctly linked to another alias that corresponds to the same entity by best rank, which is significantly better than random (0.007%) and a geodesic distance (1%) baseline prediction. Correct linkages increase to 15% and 30% within top-10 (0.07% of all emails) and top-100 rank lists (0.7% of all emails), respectively.
doi:10.1145/1134271.1134279 dblp:conf/kdd/HolzerMS05 fatcat:kkzns2kaing7xbl2yld44is7du