Filters








146 Hits in 2.6 sec

Boosting Terminology Extraction through Crosslingual Resources

Sergio Cajal, Horacio Rodríguez
2014 Revista de Procesamiento de Lenguaje Natural (SEPLN)  
Introduction Terminology Extraction is an important Natural Language Processing, NLP, task with multiple applications in many areas.  ...  Term extraction (or detection) is difficult because there is no formal difference between a term and a non terminological unit of the language.  ... 
dblp:journals/pdln/CajalR14 fatcat:t2cxmmn35nh6ber5rtozlkxtbu

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity

Ivan Vulić, Edoardo Maria Ponti, Ira Leviant, Olga Majewska, Matt Malone, Roi Reichart, Simon Baker, Ulla Petti, Kelly Wing, Eden Bar, Thierry Poibeau, Anna Korhonen
2020 Computational Linguistics  
Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili).  ...  Such a large-scale semantic resource could inspire significant further advances in NLP across languages.  ...  , however, one viable strategy to steer monolingual word vector spaces to emphasize semantic similarity is through crosslingual transfer of lexical knowledge, usually through a shared crosslingual word  ... 
doi:10.1162/coli_a_00391 fatcat:42esnmz2gvgs7irdhigl6t7xtm

Cross-lingual Candidate Search for Biomedical Concept Normalization [article]

Roland Roller, Madeleine Kittner, Dirk Weissenborn, Ulf Leser
2018 arXiv   pre-print
biomedical terminology.  ...  Concept normalization of non-English biomedical text is even more challenging as non-English resources tend to be much smaller and contain less synonyms.  ...  (BMBF) through the project PERSONS (031L0030B).  ... 
arXiv:1805.01646v1 fatcat:ohcpjeltpbgndgqnkz6bj2f6aa

Neural Cross-Lingual Event Detection with Minimal Parallel Resources

Jian Liu, Yubo Chen, Kang Liu, Jun Zhao
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
Crosslingual ED aims to tackle this challenge by transferring knowledge between different languages to boost performance.  ...  The efficiency of our method is studied through extensive experiments on two standard datasets.  ...  Multilingual Co-Training To enable multilingual co-training, we adopt the cross-entropy loss, and we use λ to balance the contribution of multilingual resources (which is set as 0.7 through a grid search  ... 
doi:10.18653/v1/d19-1068 dblp:conf/emnlp/LiuCLZ19 fatcat:oehg3ovjhbcjzczgrcw2uv5kqe

Lynx D3.9 Information Retrieval and Recommender Services

Maria Khvalchik, Artem Revenko, Christian Sageder, Víctor Rodríguez-Doncel, Pablo Calleja-Ibáñez
2020 Zenodo  
through corpora.  ...  Terminology Query enriches the question with linguistic information from domain dependent vocabularies (terminologies) and domain independent vocabularies (dictionaries) to facilitate efficient searching  ...  Concept extraction and enrichment are done by SEAR to boost searching for relevant documents.  ... 
doi:10.5281/zenodo.3870456 fatcat:s4yfxdewlje5rfu6ewecwdeuti

Web Track for CLEF2005 at Alicante University

Trinitario Martínez, Elisa Noguera, Rafael Muñoz, Fernando Llopis
2005 Conference and Labs of the Evaluation Forum  
Retrieving in a Multi/Crosslingual manner is a natural and common established way for carrying out web searches.  ...  Our major lack here is the necessity of resources (stemmers, stopwords lists and so on).  ... 
dblp:conf/clef/MartinezNML05a fatcat:criik23efjayfdjdgdb7jrtqri

Lynx D2.2 Intermediate report on Lynx acquired vocabularies

Ilan Kernerman, Patricia Martín Chozas, Andis Lagzdiņš
2018 Zenodo  
These cover both general language and legal terminology, including resources that already exist and others to be added.  ...  The existing vocabularies ensue mainly from two Lynx partners (TILDE and KD) along with some open access terminological resources in particular, and are complemented by resources specifically created to  ...  Resources identified here will be managed through TILDE's Terminology service, which also allows the management of private term collections.  ... 
doi:10.5281/zenodo.1745231 fatcat:mg3qiyrdebdvjfdkpn4irbgdsq

Multilingual Open Information Extraction: Challenges and Opportunities

Daniela Barreiro Claro, Marlo Souza, Clarissa Castellã Xavier, Leandro Oliveira
2019 Information  
As a consequence, the need to extract useful information from different languages increases, highlighting the importance of research into Open Information Extraction (OIE) techniques.  ...  In those approaches, multilingualism is restricted to processing text in different languages, rather than exploring cross-linguistic resources, which results in low precision due to the use of general  ...  With this, we want to evaluate how much novel information extracted in one language can boost the extractions for the other.  ... 
doi:10.3390/info10070228 fatcat:pisgymdwx5edvffnbu5sgcu32e

Experiments with Citation Mining and Key-Term Extraction for Prior Art Search

Patrice Lopez, Laurent Romary
2010 Conference and Labs of the Evaluation Forum  
of our multi-domain terminological database called GRISP.  ...  . • A key-term extraction module developed for technical and scientific documents and adapted to patent document structures using a vast ranges of metrics, features and a bagged decision tree. • An improvement  ...  Citation relations between patents through time are manifestations of technological improvements and evolutions.  ... 
dblp:conf/clef/LopezR10 fatcat:tdqgibes6nd2fojwbb2zset4em

Lynx D2.5 Report on Lynx acquired vocabularies

Ilan Kernerman, Patricia Martín Chozas, Andis Lagzdiņš, Jorge Gracia
2019 Zenodo  
The existing vocabularies ensue mainly from two Lynx partners (TILDE and KD), along with some open access terminological resources in particular, and are complemented by resources specifically created  ...  This deliverable constitutes the final report on the vocabulary resources collected and generated in the Lynx project, coverןמע both general language and legal terminology.  ...  This corpus is processed through TILDE's Terminology Extraction service (TermEx).  ... 
doi:10.5281/zenodo.3558710 fatcat:sx7zrwlcc5cjjpc4t5euywfqgq

Discovering Representation Sprachbund For Multilingual Pre-Training [article]

Yimin Fan, Yaobo Liang, Alexandre Muzio, Hany Hassan, Houqiang Li, Ming Zhou, Nan Duan
2021 arXiv   pre-print
Multilingual pre-trained models have demonstrated their effectiveness in many multilingual NLP tasks and enabled zero-shot or few-shot transfer from high-resource languages to low resource ones.  ...  Thus, languages in the same representation sprachbund are supposed to boost each other in both pre-training and fine-tuning as they share rich linguistic similarity.  ...  for bridging the large performance gap between high resource and low resource languages.  ... 
arXiv:2109.00271v1 fatcat:2tju4rui2jeuvlcef5up4gjdvq

A Survey on Awesome Korean NLP Datasets [article]

Byunghyun Ban
2021 arXiv   pre-print
Multilingual and Crosslingual Focused Evaluation." Proceedings of the [48] Cho, Won Ik, Seok Min Kim, and Nam Soo Kim.  ...  KLUE dataset was created through collaboration among academia and industry, and S1.  ... 
arXiv:2112.01624v2 fatcat:xkq767m67nfypgz4nehle7pgci

Modeling Event Extraction via Multilingual Data Sources

Andrew Hsi, Jaime G. Carbonell, Yiming Yang
2015 Text Analysis Conference  
We focus in particular on development of multilingual event extraction through the combination of language-dependent and languageindependent features.  ...  how to adapt event extraction toward low-resource settings.  ...  In Section 2, we introduce some necessary terminology used in the event extraction community.  ... 
dblp:conf/tac/HsiCY15 fatcat:wofvjynavffjpiyhnm2iml7s64

Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts [article]

Saadullah Amin, Noon Pokaratsiri Goldstein, Morgan Kelly Wixted, Alejandro García-Rudolph, Catalina Martínez-Costa, Günter Neumann
2022 arXiv   pre-print
Pre-trained language models (LM) have shown great potential for cross-lingual transfer in low-resource settings.  ...  These texts, which often contain protected health information (PHI), are exposed to information extraction tools for downstream applications, risking patient identification.  ...  The authors also acknowledge the cluster compute resources provided by the DFKI. We avoid releasing our dataset due to presence of real PHI information.  ... 
arXiv:2204.04775v1 fatcat:ai5t6ulki5anpm3kp4vxyt64he

Language and Domain Aware Lightweight Ontology Matching

Gabor Bella, Fausto Giunchiglia, Fiona McNeill
2017 Social Science Research Network  
We also design and evaluate a fusion matcher that combines the outputs of the two techniques in order to boost precision or recall beyond the results produced by either technique alone.  ...  Wordnets are domain-independent resources that lack specialised domain terminology.  ...  Often developed through research or community efforts, these resources offer variable levels of lexical coverage.  ... 
doi:10.2139/ssrn.3199131 fatcat:rrybtnkqt5cyxhouzmxqzghknu
« Previous Showing results 1 — 15 out of 146 results