Filters








203 Hits in 3.4 sec

Leveraging Monolingual Data for Crosslingual Compositional Word Representations [article]

Hubert Soyer and Pontus Stenetorp and Akiko Aizawa
2015 arXiv   pre-print
In this work, we present a novel neural network based architecture for inducing compositional crosslingual word representations.  ...  Unlike previously proposed methods, our method fulfills the following three criteria; it constrains the word-level representations to be compositional, it is capable of leveraging both bilingual and monolingual  ...  Constrain the word-level representations to be compositional. 2. Leverage both monolingual and bilingual data. 3. Scale to large vocabulary sizes without greatly impacting training time.  ... 
arXiv:1412.6334v4 fatcat:2qdsy3zi2jhklpffvedchaaq54

CroVeWA: Crosslingual Vector-Based Writing Assistance

Hubert Soyer, Goran Topić, Pontus Stenetorp, Akiko Aizawa
2015 Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations  
We present an interactive web-based writing assistance system that is based on recent advances in crosslingual compositional distributed semantics.  ...  By employing crosslingually constrained vector space models to represent phrases, our system naturally sidesteps several difficulties that would arise from direct word-to-text matching, and is able to  ...  Acknowledgements This work was supported by the Data Centric Science Research Commons Project at the Research Organization of Information and Systems and by the Japan Society for the Promotion of Science  ... 
doi:10.3115/v1/n15-3019 dblp:conf/naacl/SoyerTSA15 fatcat:gnceffmyu5ehbonxnbvdzp5vla

Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision [article]

Yixin Cao and Lei Hou and Juanzi Li and Zhiyuan Liu and Chengjiang Li and Xu Chen and Tiansi Dong
2018 arXiv   pre-print
In this paper, we propose a novel method for joint representation learning of cross-lingual words and entities.  ...  Joint representation learning of words and entities benefits many NLP tasks, but has not been well explored in cross-lingual settings.  ...  Crosslingual word and entity representation learning is to map words and entities in different languages into a unified semantic space.  ... 
arXiv:1811.10776v1 fatcat:rkyiznw2hvdjhdy4fylkxw4fay

Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision

Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Chengjiang Li, Xu Chen, Tiansi Dong
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
In this paper, we propose a novel method for joint representation learning of cross-lingual words and entities.  ...  Joint representation learning of words and entities benefits many NLP tasks, but has not been well explored in cross-lingual settings.  ...  Crosslingual word and entity representation learning is to map words and entities in different languages into a unified semantic space.  ... 
doi:10.18653/v1/d18-1021 dblp:conf/emnlp/0002HLLLCD18 fatcat:bmbfxf7bb5clljirnliytkagou

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity

Ivan Vulić, Edoardo Maria Ponti, Ira Leviant, Olga Majewska, Matt Malone, Roi Reichart, Simon Baker, Ulla Petti, Kelly Wing, Eden Bar, Thierry Poibeau, Anna Korhonen
2020 Computational Linguistics  
word embeddings (such as fastText, monolingual and multilingual BERT, XLM), externally informed lexical representations, as well as fully unsupervised and (weakly) supervised cross-lingual word embeddings  ...  Each language data set is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs, providing a representative coverage of word classes (nouns, verbs  ...  The crosslingual Multi-Simlex data sets are constructed automatically, leveraging word pair translations and annotations collected in all 12 languages.  ... 
doi:10.1162/coli_a_00391 fatcat:42esnmz2gvgs7irdhigl6t7xtm

Learning Cross-lingual Word Embeddings via Matrix Co-factorization

Tianze Shi, Zhiyuan Liu, Yang Liu, Maosong Sun
2015 Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)  
A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features.  ...  We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matrices.  ...  We thank the anonymous reviewers for the valuable comments.  ... 
doi:10.3115/v1/p15-2093 dblp:conf/acl/ShiLLS15 fatcat:av5lk72avvex3ked7wsy6eivtq

A Multi-task Approach to Learning Multilingual Representations

Karan Singla, Dogan Can, Shrikanth Narayanan
2018 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)  
We present a novel multi-task modeling approach to learning multilingual distributed representations of text.  ...  Our system learns word and sentence embeddings jointly by training a multilingual skipgram model together with a cross-lingual sentence similarity model.  ...  data size seems to be a good heuristic for choosing monolingual data size.  ... 
doi:10.18653/v1/p18-2035 dblp:conf/acl/SinglaCN18 fatcat:dnfgrpqlbfhfbbgk7tjocccdla

Code-switching Language Modeling With Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English [article]

Injy Hamed, Moritz Zhu, Mohamed Elmahdy, Slim Abdennadher, Ngoc Thang Vu
2019 arXiv   pre-print
word representations without the use of any parallel data, relying only on monolingual and a small amount of CS data.  ...  In this work, we explore the potential use of bilingual word embeddings for code-switching (CS) language modeling (LM) in the low resource Egyptian Arabic-English language.  ...  Several approaches have been proposed for building bilingual word embeddings, where the bilingual word representations across multiple languages can be jointly learned, or where independently-learned monolingual  ... 
arXiv:1909.10892v1 fatcat:5xo3uhacg5aarekkructmhzsb4

Robust Cross-lingual Embeddings from Parallel Sentences [article]

Ali Sabet, Prakhar Gupta, Jean-Baptiste Cordonnier, Robert West, Martin Jaggi
2020 arXiv   pre-print
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word and sentence representations.  ...  As an additional advantage, our bilingual method leads to a much more pronounced improvement in the the quality of monolingual word vectors compared to other competing methods.  ...  . • Faruqui and Dyer (2014) show that training on parallel data additionally enriches monolingual representation quality.  ... 
arXiv:1912.12481v2 fatcat:onah22qti5gmrghnyi7o6h4pua

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders [article]

Simon Šuster and Ivan Titov and Gertjan van Noord
2016 arXiv   pre-print
We observe that the word representations induced from bilingual data outperform the monolingual counterparts across a range of evaluation tasks, even though crosslingual information is not available at  ...  Our model consists of an encoder, which uses monolingual and bilingual context (i.e. a parallel sentence) to choose a sense for a given word, and a decoder which predicts context words based on the chosen  ...  Acknowledgments We would like to thank Jiwei Li for providing his tagger implementation, and Robert Grimm, Diego Marcheggiani and the anonymous reviewers for useful comments.  ... 
arXiv:1603.09128v1 fatcat:oheyc3avnzhstashgazhehoxwq

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

Simon Šuster, Ivan Titov, Gertjan van Noord
2016 Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies  
We observe that the word representations induced from bilingual data outperform the monolingual counterparts across a range of evaluation tasks, even though crosslingual information is not available at  ...  Our model consists of an encoder, which uses monolingual and bilingual context (i.e. a parallel sentence) to choose a sense for a given word, and a decoder which predicts context words based on the chosen  ...  Acknowledgments We would like to thank Jiwei Li for providing his tagger implementation, and Robert Grimm, Diego Marcheggiani and the anonymous reviewers for useful comments.  ... 
doi:10.18653/v1/n16-1160 dblp:conf/naacl/SusterTN16 fatcat:ndi2vnvbujbqxfbpapxkkruhmi

Unsupervised Cross-Lingual Information Retrieval Using Monolingual Data Only

Robert Litschko, Goran Glavaš, Simone Paolo Ponzetto, Ivan Vulić
2018 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval - SIGIR '18  
The framework leverages shared cross-lingual word embedding spaces in which terms, queries, and documents can be represented, irrespective of their actual language.  ...  We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all.  ...  CONCLUSION We have presented a fully unsupervised CLIR framework that leverages unsupervised cross-lingual word embeddings induced solely on the basis of monolingual corpora.  ... 
doi:10.1145/3209978.3210157 dblp:conf/sigir/LitschkoGPV18 fatcat:ine474q52bh5hl66my6jif3kaa

A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages

Clara Vania, Yova Kementchedjhieva, Anders Søgaard, Adam Lopez
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
In Abstract Parsers are available for only a handful of the world's languages, since they require lots of training data. How far can we get with just a small amount of training data?  ...  We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration.  ...  Acknowledgments Clara Vania is supported by the Indonesian Endowment Fund for Education (LPDP), the Centre for Doctoral Training in Data Science, funded by the UK EPSRC (grant EP/L016427/1), and the University  ... 
doi:10.18653/v1/d19-1102 dblp:conf/emnlp/VaniaKSL19 fatcat:w6tt4bygofefxbtgdepnoktfku

A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages [article]

Clara Vania, Yova Kementchedjhieva, Anders Søgaard, Adam Lopez
2019 arXiv   pre-print
Parsers are available for only a handful of the world's languages, since they require lots of training data. How far can we get with just a small amount of training data?  ...  We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration.  ...  Acknowledgments Clara Vania is supported by the Indonesian Endowment Fund for Education (LPDP), the Centre for Doctoral Training in Data Science, funded by the UK EPSRC (grant EP/L016427/1), and the University  ... 
arXiv:1909.02857v1 fatcat:uuraw5arvvg3bpyje4s3eqbdey

Cross-lingual Models of Word Embeddings: An Empirical Comparison [article]

Shyam Upadhyay, Manaal Faruqui, Chris Dyer, Dan Roth
2016 arXiv   pre-print
Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature.  ...  Approved for Public Release, Distribution Unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S.  ...  Figure 1 : 1 (Above) A general schema for induction of crosslingual word vector representations.  ... 
arXiv:1604.00425v2 fatcat:zocq637ljzhxjbzca2unwtzmfi
« Previous Showing results 1 — 15 out of 203 results