Filters








3,287 Hits in 6.4 sec

A comparison of retrieval-based hierarchical clustering approaches to person name disambiguation

Christof Monz, Wouter Weerkamp
2009 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09  
This paper describes a simple clustering approach to person name disambiguation of retrieved documents. The methods are based on standard IR concepts and do not require any task-specific features.  ...  Despite their simplicity these approaches achieve very competitive performance.  ...  RETRIEVAL-BASED CLUSTERING There are a number of well-established clustering approaches that have been used in various machine learning tasks, with K-Means clustering and agglomerative hierarchical clustering  ... 
doi:10.1145/1571941.1572060 dblp:conf/sigir/MonzW09 fatcat:ge7md3yak5hxnbie2nc4zrv4ci

Semi-Supervised Personal Name Disambiguation Technique for the Web

P. Selvaperumal, A. Suruliandi
2016 International Journal of Modern Education and Computer Science  
Personal name disambiguation involves disambiguating the name by clustering web page collection such that each cluster represents a person having the ambiguous name.  ...  In this paper, a personal name disambiguation technique that makes use of rich set of features like Nouns, Noun phrases, and frequent keywords as features is proposed.  ...  Justified by this, a semi-supervised learning based name disambiguation process is proposed to disambiguate personal names in the web pages.  ... 
doi:10.5815/ijmecs.2016.03.04 fatcat:4lr5mz5pnjazfhfvnfp6cskczm

Chinese Personal Name Disambiguation Based on Clustering

Chao Fan, Yu Li, Shan Zhong
2021 Wireless Communications and Mobile Computing  
Finally, a labeling approach is employed to bring forward feature words that can represent each cluster. The experiment achieves a good result for five groups of Chinese personal names.  ...  This research explores the Chinese personal name disambiguation based on clustering technique. Preprocessing is applied to transform raw corpus into standardized format at the beginning.  ...  [1] exploited two topic-based models to extract features from corpus and achieved a good effect for personal name disambiguation. Zhao et al.  ... 
doi:10.1155/2021/3790176 fatcat:do4uzskhsff4hhxpoe7eiw65um

A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Digital Libraries

Muhammad Imran, Syed Zeeshan Haider Gillani, Maurizio Marchese
2013 D-Lib Magazine  
., creating a multilayered hierarchical clustering algorithm which transforms itself according to the available information, and forms clusters of unambiguous records.  ...  We propose a heuristic-based, unsupervised and adaptive method that also examines users' interactions in order to include users' feedback in the disambiguation process.  ...  We use a multilayer hierarchical clustering approach based on divisive approach, the K-means algorithm and agglomerative approach.  ... 
doi:10.1045/september2013-imran fatcat:re73tl6bfjcl3optnq62pu7ufu

Bootstrapping Wikipedia to answer ambiguous person name queries

Toni Gruetze, Gjergji Kasneci, Zhe Zuo, Felix Naumann
2014 2014 IEEE 30th International Conference on Data Engineering Workshops  
A possible approach to solve this problem is to cluster the results, so that each cluster represents one of the persons occurring in the answer set.  ...  We have evaluated our methods on a hand-labeled dataset of around 5,000 Web pages retrieved from Google queries on 50 ambiguous person names.  ...  However, Balog et al. compare the performance of a simple bag-of-word based clustering approaches for the personal name resolution task and showed comparable results to state-of-the-art approaches that  ... 
doi:10.1109/icdew.2014.6818303 dblp:conf/icde/GrutzeKZN14 fatcat:wxwuwip2hbfzbcejg7m5jeqxwe

Applying Semantic Social Graphs to Disambiguate Identity References [chapter]

Matthew Rowe
2009 Lecture Notes in Computer Science  
Person disambiguation monitors web appearances of a person by disambiguating information belonging to different people sharing the same name.  ...  We present a new distance measure called "Optimum Transitions" and evaluate the accuracy of our approach using the information retrieval measure f-measure.  ...  Identity Disambiguation Work by [16] uses an unsupervised method to perform person disambiguation by searching for web pages using a person name, and clustering web pages accordring to the community  ... 
doi:10.1007/978-3-642-02121-3_35 fatcat:7u4ybgx3qna73gzxtev6m2r5im

Name Disambiguation in Anonymized Graphs using Network Embedding [article]

Baichuan Zhang, Mohammad Al Hasan
2017 arXiv   pre-print
In the methodological aspect, the proposed method uses a novel representation learning model to embed each document in a low dimensional vector space where name disambiguation can be solved by a hierarchical  ...  To resolve this issue, the name disambiguation task is designed which aims to partition the documents associated with a name reference such that each partition contains documents pertaining to a unique  ...  Performance Comparison over the Number of Clusters One of the potential problems for name disambiguation is to determine the number of real-life persons L under a given name reference, because in real-life  ... 
arXiv:1702.02287v4 fatcat:wzzuhqlrvbaine5ajd3uhm2c5q

Name Disambiguation in Anonymized Graphs using Network Embedding

Baichuan Zhang, Mohammad Al Hasan
2017 Proceedings of the 2017 ACM on Conference on Information and Knowledge Management - CIKM '17  
In the methodological aspect, the proposed method uses a novel representation learning model to embed each document in a low dimensional vector space where name disambiguation can be solved by a hierarchical  ...  To resolve this issue, the name disambiguation task is designed which aims to partition the documents associated with a name reference such that each partition contains documents pertaining to a unique  ...  Performance Comparison over the Number of Clusters One of the potential problems for name disambiguation is to determine the number of real-life persons L under a given name reference, because in real-life  ... 
doi:10.1145/3132847.3132873 dblp:conf/cikm/ZhangH17 fatcat:qzft4eecobhrjhjvh7sijcfqiu

Disambiguating fine-grained place names from descriptions by clustering [article]

Hao Chen, Maria Vasardani, Stephan Winter
2018 arXiv   pre-print
For this purpose, we evaluate the performance of different existing clustering-based approaches, since clustering approaches require no more knowledge other than the locations of ambiguous place names.  ...  Everyday place descriptions often contain place names of fine-grained features, such as buildings or businesses, that are more difficult to disambiguate than names referring to larger places, for example  ...  We focus on clustering-based disambiguation approaches, as clustering approaches require minimum pre-knowledge of the place names to be disambiguated compared to knowledge-and machine learning-based approaches  ... 
arXiv:1808.05946v1 fatcat:zl4ksyooqvd6rcljmrg3cwjx3u

Efficient topic-based unsupervised name disambiguation

Yang Song, Jian Huang, Isaac G. Councill, Jia Li, C. Lee Giles
2007 Proceedings of the 2007 conference on Digital libraries - JCDL '07  
In this paper, we focus on the problem of disambiguating person names within web pages and scientific documents. We present an efficient and effective twostage approach to disambiguate names.  ...  Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words.  ...  For the purpose of name disambiguation, the topicname matrix is processed further with a hierarchical clustering method.  ... 
doi:10.1145/1255175.1255243 dblp:conf/jcdl/SongHCLG07 fatcat:k26gwgsok5cqnas7uapu2obzhy

A Network Analysis Model for Disambiguation of Names in Lists

Bradley Malin, Edoardo Airoldi, Kathleen M. Carley
2005 Computational and mathematical organization theory  
Using a global similarity threshold, we demonstrate random walks achieve a significant increase in disambiguation capability in comparison to prior models.  ...  Prior name disambiguation methods measured similarity between two names as a function of their respective documents.  ...  A failure occurs every time a random walk from a to b is terminated because it reaches the maximum number of steps, rather than because it reaches its target node, i.e., b in this case.  ... 
doi:10.1007/s10588-005-3940-3 fatcat:qsspdvcopjdhfd6f7mvjpx3uuq

Streaming Cross Document Entity Coreference Resolution

Delip Rao, Paul McNamee, Mark Dredze
2010 International Conference on Computational Linguistics  
As a consequence, the dominant approach is based on greedy agglomerative clustering techniques that utilize pairwise vector comparisons and thus require O(n 2 ) space and time.  ...  We show that our approach scales to at least an order of magnitude larger data than previous reported methods.  ...  This data exhibits no name variants and is strictly a disambiguation task. We include this data (smith) to allow comparison to previous work.  ... 
dblp:conf/coling/RaoMD10 fatcat:ibcy7bptprhdfew6fxscjg63d4

A Multi-Level Author Name Disambiguation Algorithm

Siyang Zhang, Xinhua E, Tian Pan
2019 IEEE Access  
This algorithm is mainly based on the unsupervised algorithm, which combines hierarchical agglomerative clustering (HAC) and graph theory for disambiguating.  ...  In this paper, we propose a multi-level name disambiguation algorithm.  ...  In this paper, we considered several baseline methods based on Hierarchical Agglomerative Clustering (HAC) [2] and Graph [3] . For a fair comparison, we use the same feature.  ... 
doi:10.1109/access.2019.2931592 fatcat:hgvcdv6uhjevjlk5nqvqfu6tvi

An MCL-Based Text Mining Approach for Namesake Disambiguation on the Web

Tarique Anwar, Muhammad Abulaish
2012 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology  
In this paper, we propose a Markov CLustering (MCL) based text mining approach for namesake disambiguation on the Web.  ...  The novelty of the proposed technique lies in modeling the collection of webpages using a weighted graph structure and applying MCL to crystalize it into different clusters, each one containing the webpages  ...  ACKNOWLEDGMENT The authors would like to thank King Abdulaziz City for Science and Technology (KACST) and King Saud University for their support.  ... 
doi:10.1109/wi-iat.2012.239 dblp:conf/webi/AnwarA12 fatcat:gflbbvxnlrao7jwuqcllczvn7u

AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB

Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka
2012 Computational intelligence  
A TEM of a person captures named entities and attribute values that are useful to disambiguate that person from his or her namesakes (i.e. different people who share the same name).  ...  We then use group average agglomerative clustering to identify the instances of an ambiguous name that belong to the same person. Ideally, each cluster must represent a different namesake.  ...  Acknowledgements We would like to extend our sincere thanks to the anonymous reviewers who provided invaluable comments that have improved the quality of the paper.  ... 
doi:10.1111/j.1467-8640.2012.00449.x fatcat:lzsv2j75ofd5zcc43wsx6mdrs4
« Previous Showing results 1 — 15 out of 3,287 results