35,174 Hits in 6.0 sec

Scaling Multiple-Source Entity Resolution using Statistically Efficient Transfer Learning [article]

Sahand Negahban, Benjamin I. P. Rubinstein, Jim Gemmell
2012 arXiv   pre-print
We consider a serious, previously-unexplored challenge facing almost all approaches to scaling up entity resolution (ER) to multiple data sources: the prohibitive cost of labeling training data for supervised  ...  We address this challenge with a brand new transfer learning algorithm which requires far less training data (or equivalently, achieves superior accuracy with the same data) and is trained using fast convex  ...  INTRODUCTION In this paper we investigate a serious and previously-unexplored challenge to scaling joint entity resolution (ER) to multiple sources: that of intractable labeling costs required to model  ... 
arXiv:1208.1860v1 fatcat:qb7asbtwtngnnfyotvmcmpys5u

Entity profiling with varying source reliabilities

Furong Li, Mong Li Lee, Wynne Hsu
2014 Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14  
Scaling multiple-source entity resolution using statistically efficient transfer learning. In CIKM, 2012 2. Yin et al. TruthFinder. In KDD, 2007 3. Guo et al.  ...  matching decisions Address the problem of building entity profiles by collating data records from multiple sources in the presence of erroneous values  Interleave record linkage with truth discovery  ... 
doi:10.1145/2623330.2623685 dblp:conf/kdd/LiLH14 fatcat:lwxotg2szfgtbnj2iytkol4mfe


L. Xin
2018 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
Utilizing high-resolution remote sensing images for earth observation has become the common method of land use monitoring.  ...  Whether in terms of efficiency or accuracy, deep learning method is more preponderant.  ...  In general, we use screenshots tools to capture building sample library of multi-source and multi-scale spatial resolution on Google Earth, covering domestic and foreign the building entity of different  ... 
doi:10.5194/isprs-archives-xlii-3-1959-2018 fatcat:ytztzjllarczhaajfrf3mqplwq

(Almost) All of Entity Resolution [article]

Olivier Binette, Rebecca C. Steorts
2022 arXiv   pre-print
bibliographic data, all these applications have a common theme - integrating information from multiple sources.  ...  We review clustering approaches to entity resolution, semi- and fully supervised methods, and canonicalization, which are being used throughout industry and academia in applications such as human rights  ...  This is discussed in Section 7. 1.3 Challenges of Entity Resolution Entity resolution is difficult because of the need to balance between: (1) efficient methods which scale to large databases, (2)  ... 
arXiv:2008.04443v3 fatcat:6tunuro7afhmbpambcn2bk32ly

Chapter 7 Scalable Knowledge Graph Processing Using SANSA [chapter]

Hajira Jabeen, Damien Graux, Gezim Sejdiu
2020 Lecture Notes in Computer Science  
After reading this chapter, the reader should have an understanding of the different layers and corresponding APIs available to handle Knowledge Graphs at scale using SANSA.  ...  SANSA is built using general-purpose processing engines Apache Spark and Apache Flink.  ...  In SANSA, we use scalable techniques like vectorization using hashingTF, count-vectorization and Locality Sensitive Hashing [190] to achieve almost linear performance for large-scale entity resolution  ... 
doi:10.1007/978-3-030-53199-7_7 fatcat:zx4suhofwngsxbigtars4y4s3u

Multi-source knowledge fusion: a survey

Xiaojuan Zhao, Yan Jia, Aiping Li, Rong Jiang, Yichen Song
2020 World wide web (Bussum)  
On this basis, the challenges and future research directions of multisource knowledge fusion in a large-scale knowledge base environment are discussed.  ...  Due to the uncertainty of knowledge acquisition, the reliability and confidence of KG based on entity recognition and relationship extraction technology need to be evaluated.  ...  as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.  ... 
doi:10.1007/s11280-020-00811-0 fatcat:ef5j2sna6fai7k2455yihrrfuq

Multi-relational data mining 2004

Sašo Džeroski, Hendrik Blockeel
2004 SIGKDD Explorations  
Autocorrelation is a statistical dependency between the values of the same variable on related entities.  ...  Bhattacharya and Getoor study the problem of entity resolution in a context where objects are of different types.  ... 
doi:10.1145/1046456.1046481 fatcat:u363hoo33vdyrhdaqpoqqbpj5e

Front Matter: Volume 11756

Lynne L. Grewe, Erik P. Blasch, Ivan Kadar
2021 Signal Processing, Sensor/Information Fusion, and Target Recognition XXX  
using a Base 36 numbering system employing both numerals and letters.  ...  a deep learning architecture 11756 0S Fairness-by-design Dempster-Shafer reasoning system 11756 0T OCULUS iCrowd: an integrated C2I and simulation environment for security management, anomaly detection  ...  It fuses data from multiple sources via sentiment analysis, with a multi-layer knowledge graph to connect an entity and its behavior.  ... 
doi:10.1117/12.2598593 fatcat:5afkuwltljctxayaup2rz2njly

Natural Language Processing for Information Extraction [article]

Sonit Singh
2018 arXiv   pre-print
Various sub-tasks of IE such as Named Entity Recognition, Coreference Resolution, Named Entity Linking, Relation Extraction, Knowledge Base reasoning forms the building blocks of various high end Natural  ...  Much of this data lies in unstructured form and manually managing and effectively making use of it is tedious, boring and labor intensive.  ...  .,2015) considers the multiple types of entities and relations simultaneously, and replaces transfer matrix by the product of two projection vectors of an entity relation pair.  ... 
arXiv:1807.02383v1 fatcat:3bdyidbjp5hn7c2w4iqve4ajvi

The Computational Limits of Deep Learning [article]

Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, Gabriel F. Manso
2020 arXiv   pre-print
Thus, continued progress in these applications will require dramatically more computationally-efficient methods, which will either have to come from changes to deep learning or from moving to other machine  ...  learning methods.  ...  Statistical optimality of stochastic gradient descent on hard learning problems through multiple passes.  ... 
arXiv:2007.05558v1 fatcat:w2grqtaksjaydk4o64rpegpfxu

One Model to Recognize Them All: Marginal Distillation from NER Models with Different Tag Sets [article]

Keunwoo Peter Yu, Yi Yang
2020 arXiv   pre-print
However, given a particular downstream application, there is often no single NER resource that supports all the desired entity types, so users must leverage multiple resources with different tag sets.  ...  Named entity recognition (NER) is a fundamental component in the modern language understanding pipeline. Public NER resources such as annotated data and model services are available in many domains.  ...  Progressive learning falls in a more specific transfer learning category, in which we need to transfer knowledge from a source model using a target dataset that involves additional labels.  ... 
arXiv:2004.05140v2 fatcat:g4zrgjapj5eqdhkrsxxdb4ftc4

Gradual Machine Learning for Entity Resolution [article]

Boyi Hou, Qun Chen, Yanyan Wang, Youcef Nafa, Zhanhuai Li
2019 arXiv   pre-print
Usually considered as a classification problem, entity resolution (ER) can be very challenging on real data due to the prevalence of dirty values.  ...  Using ER as a test case, we demonstrate that gradual machine learning is a promising paradigm potentially applicable to other challenging classification tasks requiring extensive labeling effort.  ...  Transfer learning focuses on using the labeled training data in a domain to help learning in another target domain.  ... 
arXiv:1810.12125v4 fatcat:bo7kmdgprjd7fh6wd2uikxedsu

Adapting to the Long Tail: A Meta-Analysis of Transfer Learning Research for Language Understanding Tasks [article]

Aakanksha Naik, Jill Lehman, Carolyn Rose
2021 arXiv   pre-print
We assess trends in transfer learning research through a qualitative meta-analysis of 100 representative papers on transfer learning for NLU.  ...  Our answers to these questions highlight major avenues for future research in transfer learning for the long tail.  ...  Acknowledgements This research was supported in part by the Intramural Research Program of the National Institutes of Health, Clinical Research Center and through an Inter-Agency Agreement with the US  ... 
arXiv:2111.01340v1 fatcat:rlg77auu3zfwdggxy3kwm7fu2m

Machine Learning-Based Semantic Entity Alignment for Multi-Source Data: a Systematic Literature Review

Alex Boyko, Siamak Farshidi, Zhiming Zhao
2021 Zenodo  
It has become more common to store data about real-world entities, where this data is often distributed across multiple data sources.  ...  Many machine learning-based semantic entity alignment approaches have been proposed by the recent studies in the field.  ...  multiple sources.  ... 
doi:10.5281/zenodo.6328248 fatcat:kl4julgduffzzhyxztsfxzsw3a

End-to-End Entity Resolution for Big Data: A Survey [article]

Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, Kostas Stefanidis
2020 arXiv   pre-print
structuredness, extreme diversity, high speed and large scale of entity descriptions used by real-world applications.  ...  One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER).  ...  learning [172]   WS Transfer learning for RDF [151]  RB WS Unsupervised ensemble [93]   U Unsupervised ER for RDF [101]   U Large-scale Collective ER [147]  RB  How  ... 
arXiv:1905.06397v3 fatcat:rs2qoolz2jcppklriew5pjfefq
« Previous Showing results 1 — 15 out of 35,174 results