Filters








5,314 Hits in 4.2 sec

Unsupervised learning of edit parameters for matching name variants

Dan Gillick, Dilek Hakkani-Tür, Michael Levit
2008 Interspeech 2008   unpublished
We introduce a novel unsupervised method for learning spelling edit probabilities which improves overall F-Measure on our own name-matching task by 12%.  ...  Our approach is a generalization of spelling correction: We compare to candidate matches by applying a set of edits to an input name.  ...  Lastly, the unsupervised method for learning edit probabilities proved quite successful.  ... 
doi:10.21437/interspeech.2008-77 fatcat:uchne7fblfa6tiawe7h3qpxfhe

Adaptive name matching in information integration

M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, S. Fienberg
2003 IEEE Intelligent Systems  
The authors compare and describe methods for combining and learning similarity measures for name matching.  ...  Let s i denote the ith letter of s, and, similarly, let t j be the jth letter of t.  ...  It was also supported by a contract from the Army Research Office to the Center for Computer and Communications Security with Carnegie Mellon University and by a faculty fellowship from IBM.  ... 
doi:10.1109/mis.2003.1234765 fatcat:b4s7ziec3ba5bhem4ay5i6qxmy

Word Embedding based Edit Distance [article]

Yilin Niu, Chao Qiao, Hang Li, Minlie Huang
2018 arXiv   pre-print
In this short paper, we address unsupervised learning for text similarity calculation.  ...  Experiments on three benchmark datasets show WED outperforms state-of-the-art unsupervised methods including edit distance, TF-IDF based cosine, word embedding based cosine, Jaccard index, etc.  ...  Conclusion We have proposed a new method named Word Embedding based Edit Distance (WED) for unsupervised text similarity calculation.  ... 
arXiv:1810.10752v1 fatcat:ichpbwfuqraxneyukkvqeak67a

CGMN: A Contrastive Graph Matching Network for Self-Supervised Graph Similarity Learning [article]

Di Jin, Luzhi Wang, Yizhen Zheng, Xiang Li, Fei Jiang, Wei Lin, Shirui Pan
2022 arXiv   pre-print
Then, we employ two strategies, namely cross-view interaction and cross-graph interaction, for effective node representation learning.  ...  To this end, we propose a contrastive graph matching network (CGMN) for self-supervised graph similarity learning in order to calculate the similarity between any two input graph objects.  ...  Acknowledgments This work was partly supported by the National Natural Science Foundation of China under grants 61876128 and Meituan Project.  ... 
arXiv:2205.15083v2 fatcat:c6aqnv27nvcx7c7lkq4tl43nwm

Author Name Disambiguation in Bibliographic Databases: A Survey [article]

Muhammad Shoaib, Ali Daud, Tehmina Amjad
2020 arXiv   pre-print
Author Name Disambiguation (AND) in Bibliographic Databases (BD) like DBLP , Citeseer , and Scopus is a specialized field of entity resolution.  ...  Categorization and elaboration of similarity metrics and methods are also provided. Finally, future directions and recommendations are given for this dynamic area of research.  ...  Acknowledgement We are grateful to the Higher Education Commission (HEC) of Pakistan for their financial assistance to promote the research trend in the country under Indigenous 5000 Fellowship Program  ... 
arXiv:2004.06391v1 fatcat:g6ohfpzeejbwhlxmt7vlmyjqo4

Name Phylogeny: A Generative Model of String Variation

Nicholas Andrews, Jason Eisner, Mark Dredze
2012 Conference on Empirical Methods in Natural Language Processing  
The final learned transducer can quickly link any test name into the final phylogeny, thereby locating variants of the test name.  ...  Our variational EM learning algorithm alternately reestimates this phylogeny and the transducer parameters.  ...  Wikipedia documents many variant names for entities.  ... 
dblp:conf/emnlp/AndrewsED12 fatcat:rlscp5wpgvb3zjcogooobi4vbu

Deduplicating a places database

Nilesh Dalvi, Marian Olteanu, Manish Raghavan, Philip Bohannon
2014 Proceedings of the 23rd international conference on World wide web - WWW '14  
We also present unsupervised techniques that can learn such a model from a database of places.  ...  We consider the problem of resolving duplicates in a database of places, where a place is defined as any entity that has a name and a physical location.  ...  ACKNOWLEDGEMENTS The authors gratefully acknowledge Kedar Bellare for many helpful discussions on the technical material, and Justin Moore and Long Chen for helpful suggestions on technical and presentation  ... 
doi:10.1145/2566486.2568034 dblp:conf/www/DalviORB14 fatcat:pvbkjaiqrvefxitp55v3oeydge

Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P

Steffen Eger
2015 Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing  
We investigate the need for bigram alignment models and the benefit of supervised alignment techniques in graphemeto-phoneme (G2P) conversion.  ...  Moreover, we find that supervised alignment techniques may perform considerably better than their unsupervised brethren and that few manually aligned training pairs suffice for them to do so.  ...  Acknowledgments I thank three anonymous reviewers and Tim vor der Brück for valuable suggestions.  ... 
doi:10.18653/v1/d15-1139 dblp:conf/emnlp/Eger15 fatcat:qwrcqpfufve4pd2efw54pkkqrm

A Comparison of String Distance Metrics for Name-Matching Tasks

William W. Cohen, Pradeep Ravikumar, Stephen E. Fienberg
2003 International Joint Conference on Artificial Intelligence  
Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names.  ...  We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators, token-based distance metrics, and hybrid methods.  ...  EIA-0131884 to the National Institute of Statistical Sciences and by a contract from the Army Research Office to the Center for Computer and Communications Security with Carnegie Mellon University.  ... 
dblp:conf/ijcai/CohenRF03 fatcat:cvfy4fqbhjdfbfvyihzfwiwwsa

UHP-SOT: An Unsupervised High-Performance Single Object Tracker [article]

Zhiruo Zhou, Hongyu Fu, Suya You, Christoph C. Borel-Donohue, C.-C. Jay Kuo
2021 arXiv   pre-print
An unsupervised online object tracking method that exploits both foreground and background correlations is proposed and named UHP-SOT (Unsupervised High-Performance Single Object Tracker) in this work.  ...  deep-learning-based SOT methods, and operates at a fast speed (i.e. 22.7-32.0 FPS on a CPU).  ...  Generally speaking, they conduct dense sampling around the object patch and solve a rigid regression problem to learn a template for similarity matching.  ... 
arXiv:2110.01812v1 fatcat:afdsuiwnivanncbcf4ecz2bolm

WERD: Using social text spelling variants for evaluating dialectal speech recognition

Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals
2017 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)  
In particular, in the process of comparing a hypothesis to a reference, we make use of spelling variants for words and phrases, which we mine from Twitter in an unsupervised fashion.  ...  Our experiments with evaluating ASR output for Egyptian Arabic, and further manual analysis, show that the resulting WERd (i.e., WER for dialects) metric, a variant of TERp, is more adequate than WER for  ...  Using the Spelling Variants for Evaluation: WERd We borrow ideas from an evaluation measure for MT evaluation, namely Translation Edit Rate Plus or TERp [22] .  ... 
doi:10.1109/asru.2017.8268928 dblp:conf/asru/AliN0R17 fatcat:mpdhm22ndrf3xb6yr62rp66oze

WERd: Using Social Text Spelling Variants for Evaluating Dialectal Speech Recognition [article]

Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals
2017 arXiv   pre-print
In particular, in the process of comparing a hypothesis to a reference, we make use of spelling variants for words and phrases, which we mine from Twitter in an unsupervised fashion.  ...  Our experiments with evaluating ASR output for Egyptian Arabic, and further manual analysis, show that the resulting WERd (i.e., WER for dialects) metric, a variant of TERp, is more adequate than WER for  ...  Using the Spelling Variants for Evaluation: WERd We borrow ideas from an evaluation measure for MT evaluation, namely Translation Edit Rate Plus or TERp [22] .  ... 
arXiv:1709.07484v1 fatcat:mi3j5n7pafhzbkazem5mc5nx5q

(Almost) Total Recall - SYDNEY CMCRC at TAC 2012

Will Radford, Will Cannings, Joel Nothman, Daniel Tse, James R. Curran, Andrew Naoum, Glen Pink
2012 Text Analysis Conference  
We explore unsupervised and supervised whole-document approaches to English NEL with naïve and context clustering.  ...  Our best system uses unsupervised entity linking and naïve clustering and scores 66.5% B 3 + F1 score. Our KB clustering score is competitive with the top systems at 65.6%.  ...  Topic Modelling We trained an LDA model using the Vowpal Wabbit online machine learning toolkit, 4 with training parameters k = 100 (the number of topics), α = 1, ρ = 0.1, on documents from TAC 09 queries  ... 
dblp:conf/tac/RadfordCNTCNP12 fatcat:t6n6gcdu7bfvfhwrqjaz2suysu

Unsupervised Learning of Link Discovery Configuration [chapter]

Andriy Nikolov, Mathieu d'Aquin, Enrico Motta
2012 Lecture Notes in Computer Science  
Existing solutions either rely on the user's knowledge of the data and the domain or on the use of machine learning to discover these parameters based on training data.  ...  In this paper, we present a novel approach to tackle the issue of data linking which relies on the unsupervised discovery of the required similarity parameters.  ...  Part of this research has been funded under the EC 7th Framework Programme, in the context of the SmartProducts project (231204).  ... 
doi:10.1007/978-3-642-30284-8_15 fatcat:m7tkzzcqsbfvncsdwjpwvq5j2y

Fast transcription of speech in low-resource languages [article]

Mark Hasegawa-Johnson, Camille Goudeseune, Gina-Anne Levow
2019 arXiv   pre-print
We present software that, in only a few hours, transcribes forty hours of recorded speech in a surprise language, using only a few tens of megabytes of noisy text in that language, and a zero-resource  ...  Acknowledgment This work was funded by the DARPA program "Low resource languages for emergent incidents (LORELEI)," DARPA-BAA-15-04.  ...  The name of our system, ASR24 [2] , refers to the original task specification: the ASR must be designed, trained, and functioning within 24 hours of learning the identity of L, using only data found on  ... 
arXiv:1909.07285v1 fatcat:762mfmqh75aghb7q4ojph67seu
« Previous Showing results 1 — 15 out of 5,314 results