Filters








317 Hits in 5.7 sec

Cross-lingual Information Retrieval with BERT [article]

Zhuolin Jiang, Amro El-Jaroudi, William Hartmann, Damianos Karakos, Lingjun Zhao
2020 arXiv   pre-print
In this paper, we explore the use of the popular bidirectional language model, BERT, to model and learn the relevance between English queries and foreign-language documents in the task of cross-lingual  ...  Multiple neural language models have been developed recently, e.g., BERT and XLNet, and achieved impressive results in various NLP tasks including sentence classification, question answering and document  ...  The idea is to solve the translation problem first, then the cross-lingual IR problem become monolingual IR.  ... 
arXiv:2004.13005v1 fatcat:c4wlwr65abekrjljez3bvzipru

Introduction to the Special Issue on Cross-Language Algorithms and Applications

Marta R. Costa-jussà, Srinivas Bangalore, Patrik Lambert, Lluís Màrquez, Elena Montiel-Ponsoda
2016 The Journal of Artificial Intelligence Research  
The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency  ...  development of the science of multi- and cross-lingualism.  ...  Acknowledgements The authors want to thank Dan Roth, Mark Sammons and an anonymous reviewer for their useful comments and suggestions on previous versions of this document.  ... 
doi:10.1613/jair.5022 fatcat:h63kjmerufgkxh3qstvegklcyy

A Deep Insight in Challenges of Natural Language Processing and Usage of Deep Learning

2020 VOLUME-8 ISSUE-10, AUGUST 2019, REGULAR ISSUE  
The complexity and diversity of the huge datasets have raised the requirement for automatic analysis of the linguistic data by using data-driven approaches.  ...  It has also enhanced the effectiveness of the communication between human and computers.  ...  The development of cross-lingual datasets, have shown the improved performance in cross-lingual models.  ... 
doi:10.35940/ijitee.l1014.10812s319 fatcat:hse4ddnfnvfrvmy7iikebnrgne

Cross-language Information Retrieval [article]

Petra Galuščáková, Douglas W. Oard, Suraj Nair
2021 arXiv   pre-print
In such cases, Cross-Language Information Retrieval (CLIR) is needed. This chapter reviews the state of the art for cross-language information retrieval and outlines some open research questions.  ...  Two key assumptions shape the usual view of ranked retrieval: (1) that the searcher can choose words for their query that might appear in the documents that they wish to see, and (2) that ranking retrieved  ...  Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.  ... 
arXiv:2111.05988v1 fatcat:fgnaux4lcbe5jlpczhbxka5cqq

Cross-language Sentence Selection via Data Augmentation and Rationale Training [article]

Yanda Chen, Chris Kedzie, Suraj Nair, Petra Galuščáková, Rui Zhang, Douglas W. Oard, Kathleen McKeown
2021 arXiv   pre-print
It uses data augmentation and negative sampling techniques on noisy parallel sentence data to directly learn a cross-lingual embedding-based query relevance model.  ...  Moreover, when a rationale training secondary objective is applied to encourage the model to match word alignment hints from a phrase-based statistical machine translation model, consistent improvements  ...  Government is authorized to reproduce and distribute reprints for governmental purposes not withstanding any copyright annotation therein.  ... 
arXiv:2106.02293v1 fatcat:ul34wwf7lbhzlmmmda473ckur4

SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage

Elizabeth Boschee, Joel Barry, Jayadev Billa, Marjorie Freedman, Thamme Gowda, Constantine Lignos, Chester Palen-Michel, Michael Pust, Banriskhem Kayang Khonglah, Srikanth Madikeri, Jonathan May, Scott Miller
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations  
In this paper we present SARAL, an end-to-end cross-lingual information retrieval (CLIR) and summarization system for lowresource languages that 1) enables English speakers to search foreign language repositories  ...  of text and audio using English queries, 2) summarizes the retrieved documents in English with respect to a particular information need, and 3) provides complete transcriptions and translations as needed  ...  Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.  ... 
doi:10.18653/v1/p19-3004 dblp:conf/acl/BoscheeBBFGLPPK19 fatcat:hh4k7vrkofd7xbg7neg5y37l74

Textual Similarity Measurement Approaches: A Survey (1)

Amira Abo-Elghit, Aya Al-Zoghby, Taher Hamza
2020 The Egyptian Journal of Language Engineering  
However, many approaches for measuring textual similarity have been presented for Arabic text reviewed and compared in this paper.  ...  Measuring textual similarity tends to have an increasingly important turn in related topics like text classification, recovery of specific information from data, clustering, topic retrieval, subject tracking  ...  in the task for English-Chinese cross-lingual scenarios.  ... 
doi:10.21608/ejle.2020.42018.1012 fatcat:a2fhtkub7nazlkgzqewqbb7koi

Tübingen system in VarDial 2017 shared task: experiments with language identification and cross-lingual parsing

Çağrı Çöltekin, Taraka Rama
2017 Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)  
For the cross-lingual parsing task, we experimented with an approach based on automatically translating the source treebank to the target language, and training a parser on the translated treebank.  ...  We also report additional experiments with neural network models. The performance of neural network models was close but always below the corresponding SVM classifiers in the discrimination tasks.  ...  Acknowledgments The authors thank the reviewers for the comments which helped improve the paper.  ... 
doi:10.18653/v1/w17-1218 dblp:conf/vardial/ColtekinR17 fatcat:xwaxmaamivgblkthaksq3oxlnm

Universal Discourse Representation Structure Parsing

Jiangming Liu, Shay B. Cohen, Mirella Lapata, Johan Bos
2021 Computational Linguistics  
We consider the task of cross-lingual semantic parsing in the style of Discourse Representation Theory (DRT) where knowledge from annotated corpora in a resource-rich language is transferred via bitext  ...  to non-English text and trains multiple parsers (one per language) on the translations.  ...  This work was partly funded by the NWO-VICI grant "Lost in Translation -Found in Meaning" (288-89-003). References Abend, Omri and Ari Rappoport. 2013.  ... 
doi:10.1162/coli_a_00406 fatcat:4d27lle64rhdbc4kbmvh36bmbe

Transliteration Better than Translation? Answering Code-mixed Questions over a Knowledge Base

Vishal Gupta, Manoj Chinnakotla, Manish Shrivastava
2018 Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching  
Our network is trained only on English questions provided in this dataset and noisy Hindi translations of these questions and can answer English-Hindi CM questions effectively without the need of translation  ...  Back-transliterated CM questions outperform their lexical and sentence level translated counterparts by 5% & 35% respectively, highlighting the efficacy of our approach in a resource-constrained setting  ...  Cross-lingual Question Answering Closely related is the problem of cross-lingual QA.  ... 
doi:10.18653/v1/w18-3205 dblp:conf/acl-codeswitch/GuptaCS18 fatcat:cuoslyt4pbbkhpmebehn6fc7hy

Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods

Salha M. Alzahrani, Naomie Salim, Ajith Abraham
2012 IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews)  
), structuralbased (STRUC), stylometric-based (STYLE), and cross-lingual techniques (CROSS).  ...  Systematic frameworks and methods of monolingual, extrinsic, intrinsic, and cross-lingual plagiarism detection are surveyed and correlated with plagiarism types, which are listed in the taxonomy.  ...  Textual Features for Cross-Lingual Plagiarism Detection Features that are based on lexical and syntactic types are improper in a cross-lingual setting, i.e., for cross-lingual text relatedness and plagiarism  ... 
doi:10.1109/tsmcc.2011.2134847 fatcat:umjzayni2bdobiwpkdtu7upir4

ADAPTING HYBRID MACHINE TRANSLATION TECHNIQUES FOR CROSS-LANGUAGE TEXT RETRIEVAL SYSTEM

P. ISWARYA, V. RADHA
2017 Journal of Engineering Science and Technology  
This research work aims in developing Tamil to English Cross - language text retrieval system using hybrid machine translation approach.  ...  From the experimental results it is clear that the proposed Tamil Query based translation system achieves significantly better translation quality over existing system, and reaches 95.88% of monolingual  ...  From Table 3 , the cross-lingual performance of proposed and existing system over monolingual run is 84.8%, 97.24% for title queries, 89.39%, 94.24% for descriptive queries and 89.06%, 96% for narrative  ... 
doaj:09b50c704a91433ea8d9fdc53d47d931 fatcat:rk2uw45sr5emjazuqmj7uab2km

A Survey Of Cross-lingual Word Embedding Models [article]

Sebastian Ruder, Ivan Vulić, Anders Søgaard
2019 arXiv   pre-print
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models  ...  We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.  ...  Acknowledgements We thank the anonymous reviewers for their valuable and comprehensive feedback.  ... 
arXiv:1706.04902v3 fatcat:lts6uop77zaazhzlbygqmdsama

A Large and Diverse Arabic Corpus for Language Modeling [article]

Abbas Raza Ali
2022 arXiv   pre-print
It consists of over 500 GB of Arabic cleaned text targeted at improving cross-domain knowledge and downstream generalization capability of large-scale language models.  ...  The tasks demonstrate a significant boost from 4.5 to 8.5% when compared to tasks fine-tuned on multi-lingual BERT (mBERT).  ...  Marcin Budka, Professor of Data Science at Bournemouth University, UK for taking part in the subject discussion and review of this work.  ... 
arXiv:2201.09227v1 fatcat:km6ab3j37zagjoi4wkwdo4ncfa

A Survey of Cross-lingual Word Embedding Models

Sebastian Ruder, Ivan Vulić, Anders Søgaard
2019 The Journal of Artificial Intelligence Research  
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models  ...  We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.  ...  Acknowledgements We thank the anonymous reviewers and the editors for their valuable and comprehensive feedback.  ... 
doi:10.1613/jair.1.11640 fatcat:vwlgtzzmhfdlnlyaokx2whxgva
« Previous Showing results 1 — 15 out of 317 results