Filters








3,812 Hits in 7.3 sec

A Systematic Investigation of KB-Text Embedding Alignment at Scale

Vardaan Pahuja, Yu Gu, Wenhu Chen, Mehdi Bahrami, Lei Liu, Wei-Peng Chen, Yu Su
2021 Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)   unpublished
We conduct a large-scale, systematic investigation of aligning KB and text embeddings for joint reasoning.  ...  We set up a novel evaluation framework with two evaluation tasks, few-shot link prediction and analogical reasoning, and evaluate an array of KB-text embedding alignment methods.  ...  This research was sponsored by a gift grant from Fujitsu and the Ohio Supercomputer Center (Center, 1987) .  ... 
doi:10.18653/v1/2021.acl-long.139 fatcat:l5ox7aqdhfetpkdeqvhmbrg7xm

XLORE2: Large-scale Cross-lingual Knowledge Graph Construction and Application

Hailong Jin, Chengjiang Li, Jing Zhang, Lei Hou, Juanzi Li, Peng Zhang
2019 Data Intelligence  
But the number of aligned properties is quite small for such a large-scale KB.  ...  In this paper, we present XLORE2, an extension of XLORE, as a holistic approach to the creation of a large-scale English-Chinese bilingual KB, to adequately answer the above problems.  ...  XLORE is the first large-scale cross-lingual KB with a balanced amount of Chinese-English knowledge.  ... 
doi:10.1162/dint_a_00003 dblp:journals/dint/JinLZHLZ19 fatcat:zdcq2gfyirc2ba4s6scxtytsna

Exploring the Combination of Contextual Word Embeddings and Knowledge Graph Embeddings [article]

Lea Dieudonat, Kelvin Han, Phyllicia Leavitt, Esteban Marquer
2020 arXiv   pre-print
performance of contextual and KB embeddings.  ...  In this work, we begin exploring another approach using contextual and KB embeddings jointly at the same level and propose two tasks -- an entity typing and a relation typing task -- that evaluate the  ...  This allows us to have a systematic classification process, with a single code applied on the embeddings regardless of their input and of the embedding size.  ... 
arXiv:2004.08371v1 fatcat:jyj5ce5wifbq5frb63max4rcye

An evaluation of forensic similarity hashes

Vassil Roussev
2011 Digital Investigation. The International Journal of Digital Forensics and Incident Response  
This study provides a baseline evaluation of the capabilities of these tools both in a controlled environment and on real-world data.  ...  Similarity hash Similarity digest Sdhash Ssdeep a b s t r a c t The fast growth of the average size of digital forensic targets demands new automated means to quickly, accurately and reliably correlate  ...  For example, for the 256 KB case we consider the embedding (at random) of 4 pieces of 32 KB and 8 pieces of 16 KB. Table 4 shows the fraction of runs that produce results greater than zero.  ... 
doi:10.1016/j.diin.2011.05.005 fatcat:nygqfuuif5b2xda5slroi7lnxu

Chemical-induced disease relation extraction via attention-based distant supervision

Jinghang Gu, Fuqing Sun, Longhua Qian, Guodong Zhou
2019 BMC Bioinformatics  
An attention-based neural network and a stacked auto-encoder network are applied respectively to induce learning models and extract relations at both levels.  ...  Supervised machine learning provides a feasible solution to automatically extract relations between biomedical entities from scientific literature, its success, however, heavily depends on large-scale  ...  Availability of data and materials The BioCreative V CDR corpus can be download from https://biocreative. bioinformatics.udel.edu/resources/corpora/biocreative-v-cdr-corpus/, and the CTD database can be  ... 
doi:10.1186/s12859-019-2884-4 fatcat:h2xnpdzlfffzrjqphubauszqp4

Ontology Reuse: the Real Test of Ontological Design [article]

Piotr Sowinski, Katarzyna Wasielewska-Michniewska, Maria Ganzha, Marcin Paprzycki, Costin Badica
2022 arXiv   pre-print
Moreover, despite recent advances, the realization of systematic ontology quality assurance remains a difficult problem.  ...  In this work, the quality of thirty biomedical ontologies, and the Computer Science Ontology are investigated, from the perspective of a practical use case.  ...  This is done using a text index. (3) The neighborhoods (most closely related entities) of the candidate entities are retrieved from the ontologies. ( 4 ) Using text embeddings and string matching, the  ... 
arXiv:2205.02892v2 fatcat:vyjky7gwifac5ejruvilpl4tci

Knowledge Base Question Answering: A Semantic Parsing Perspective [article]

Yu Gu, Vardaan Pahuja, Gong Cheng, Yu Su
2022 arXiv   pre-print
In this survey, we situate KBQA in the broader literature of semantic parsing and give a comprehensive account of how existing KBQA approaches attempt to address the unique challenges.  ...  Improvement has since been made in many downstream tasks, including natural language interface to web APIs, text-to-SQL generation, among others.  ...  Das et al. [2021] revise the inaccurate schema items by aligning them with items in the neighborhood of the topic entity using both string-level and embedding-level similarities.  ... 
arXiv:2209.04994v3 fatcat:nr6o4y6aubbyzmswmtoebuaj5q

Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases [article]

Yu Gu, Sue Kase, Michelle Vanni, Brian Sadler, Percy Liang, Xifeng Yan, Yu Su
2021 arXiv   pre-print
To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64,331 questions, GrailQA, and provide evaluation settings  ...  The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA.  ...  This research was sponsored in part by NSF 1528175, a Fujitsu gift grant and Ohio Supercomputer Center [8] .  ... 
arXiv:2011.07743v4 fatcat:sehx4775mrhmngcf4ywnr7ydce

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking [article]

Shuyan Zhou and Shruti Rijhwani and John Wieting and Jaime Carbonell and Graham Neubig
2020 arXiv   pre-print
Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts.  ...  The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention.  ...  Shruti Rijhwani is supported by a Bloomberg Data Science Ph.D. Fellowship.  ... 
arXiv:2003.01343v1 fatcat:mfwnsmnqy5elbbacl7v4e5o77e

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Shuyan Zhou, Shruti Rijhwani, John Wieting, Jaime Carbonell, Graham Neubig
2020 Transactions of the Association for Computational Linguistics  
Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts.  ...  The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention.  ...  Shruti Rijhwani is supported by a Bloomberg Data Science Ph.D. Fellowship.  ... 
doi:10.1162/tacl_a_00303 fatcat:aj2slke6xraclbl5asibjsvl64

Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases [article]

Gerhard Weikum, Luna Dong, Simon Razniewski, Fabian Suchanek
2021 arXiv   pre-print
Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines  ...  Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI.  ...  Acknowledgements Many thanks to Mike Casey and other staff at NOW Publishers.  ... 
arXiv:2009.11564v2 fatcat:vh2lqfmhhbcwpf6dcsej3hhvgy

Machine Translation using Semantic Web Technologies: A Survey

Diego Moussallem, Matthias Wauer, Axel-Cyrille Ngonga Ngomo
2018 Journal of Web Semantics  
This article presents the results of a systematic review of machine translation approaches that rely on Semantic Web technologies for translating texts.  ...  A large number of machine translation approaches have recently been developed to facilitate the fluid migration of content across languages.  ...  deal with data at Web scale.  ... 
doi:10.1016/j.websem.2018.07.001 fatcat:azjfl5e77fg5xkjiq7lzp44hnm

Improving Compositional Generalization in Semantic Parsing [article]

Inbar Oren, Jonathan Herzig, Nitish Gupta, Matt Gardner, Jonathan Berant
2020 arXiv   pre-print
In this work, we investigate compositional generalization in semantic parsing, a natural test-bed for compositional generalization, as output programs are constructed from sub-components.  ...  We analyze a wide variety of models and propose multiple extensions to the attention module of the semantic parser, aiming to improve compositional generalization.  ...  The second author was supported by a Google PhD fellowship.  ... 
arXiv:2010.05647v1 fatcat:mj7nlhfbizgbdbmwk4yrbezd4m

Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological Concepts [article]

Junheng Hao, Muhao Chen, Wenchao Yu, Yizhou Sun, Wei Wang
2021 arXiv   pre-print
Our model is trained on large-scale knowledge bases that consist of massive instances and their corresponding ontological concepts connected via a (small) set of cross-view links.  ...  We explore multiple representation techniques for the two model components and investigate with nine variants of JOIE.  ...  Figure 1 shows a snapshot of such a KB. In the past decade, KG embedding models have been widely investigated.  ... 
arXiv:2103.08115v1 fatcat:qr6jgcvic5ffrinfmj5odlzbi4

CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata [article]

Manoj Prabhakar Kannan Ravi, Kuldeep Singh, Isaiah Onando Mulang', Saeedeh Shekarpour, Johannes Hoffart, Jens Lehmann
2021 arXiv   pre-print
The first transformer model identifies surface forms (entity mentions) in a given text.  ...  CHOLAN consists of a pipeline of two transformer-based models integrated sequentially to accomplish the EL task.  ...  The EL task aligns the text into a subset of entities represented as Θ : W → E where E ⊂ E. We formulate the EL task as a three step process in which the first step is the mention detection (MD).  ... 
arXiv:2101.09969v2 fatcat:xpjiuy57zrh65ahasi3jlbwyki
« Previous Showing results 1 — 15 out of 3,812 results