Filters








2,285 Hits in 4.1 sec

Robust Cross-lingual Embeddings from Parallel Sentences [article]

Ali Sabet, Prakhar Gupta, Jean-Baptiste Cordonnier, Robert West, Martin Jaggi
2020 arXiv   pre-print
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word and sentence representations.  ...  Recent advances in cross-lingual word embeddings have primarily relied on mapping-based methods, which project pretrained word embeddings from different languages into a shared space through a linear transformation  ...  Acknowledgments We acknowledge funding from the Innosuisse ADA grant.  ... 
arXiv:1912.12481v2 fatcat:onah22qti5gmrghnyi7o6h4pua

Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training [article]

Kuan-Hao Huang, Wasi Uddin Ahmad, Nanyun Peng, Kai-Wei Chang
2021 arXiv   pre-print
Especially, learning alignments in the multilingual embedding space usually requires sentence-level or word-level parallel corpora, which are expensive to be obtained for low-resource languages.  ...  The improvement is more significant in the generalized cross-lingual transfer setting, where the pair of input sentences belong to two different languages.  ...  Note that due to the parallel nature of PAWS-X and XNLI dataset 5 , we can pair up sentences from two different languages.  ... 
arXiv:2104.08645v2 fatcat:qdzq7alqr5farkwuedxhsekf3u

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining [article]

Ivana Kvapilıkova, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar
2021 arXiv   pre-print
We first produce a synthetic parallel corpus using unsupervised machine translation, and use it to fine-tune a pretrained cross-lingual masked language model (XLM) to derive the multilingual sentence representations  ...  Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages.  ...  We propose a method to further align representations from such models into the cross-lingual space and use them to derive sentence embeddings.  ... 
arXiv:2105.10419v1 fatcat:lnc3qfaprngihhse2ras5npe3y

Context-Aware Cross-Lingual Mapping [article]

Hanan Aldarmaki, Mona Diab
2019 arXiv   pre-print
We also implement cross-lingual mapping of deep contextualized word embeddings using parallel sentences with word alignments.  ...  In our experiments, both approaches resulted in cross-lingual sentence embeddings that outperformed context-independent word mapping in sentence translation retrieval.  ...  Introduction Cross-lingual word vector models aim to embed words from multiple languages into a shared vector space to enable cross-lingual transfer and dictionary expansion (Upadhyay et al., 2016) .  ... 
arXiv:1903.03243v2 fatcat:bgy3m2lhurdrjkhcojmgiu5pjm

Scalable Cross-Lingual Transfer of Neural Sentence Embeddings

Hanan Aldarmaki, Mona Diab
2019 Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*  
Our results support representation transfer as a scalable approach for modular cross-lingual alignment of neural sentence embeddings, where we observe better performance compared to joint models in intrinsic  ...  We evaluate three alignment frameworks applied to these models: joint modeling, representation transfer learning, and sentence mapping, using parallel text to guide the alignment.  ...  The latter introduces textual noise on the input sentence to make the embeddings more robust.  ... 
doi:10.18653/v1/s19-1006 dblp:conf/starsem/AldarmakiD19 fatcat:oxgda73bpfcsphfh5vdpuhkeiy

Scalable Cross-Lingual Transfer of Neural Sentence Embeddings [article]

Hanan Aldarmaki, Mona Diab
2019 arXiv   pre-print
We develop and investigate several cross-lingual alignment approaches for neural sentence embedding models, such as the supervised inference classifier, InferSent, and sequential encoder-decoder models  ...  Our results support representation transfer as a scalable approach for modular cross-lingual alignment of neural sentence embeddings, where we observe better performance compared to joint models in intrinsic  ...  The latter introduces textual noise on the input sentence to make the embeddings more robust.  ... 
arXiv:1904.05542v1 fatcat:cvmxdq7kvzbhphsanx5rlq6rbi

Context-Aware Cross-Lingual Mapping

Hanan Aldarmaki, Mona Diab
2019 Proceedings of the 2019 Conference of the North  
We also implement cross-lingual mapping of deep contextualized word embeddings using parallel sentences with word alignments.  ...  In our experiments, both approaches resulted in cross-lingual sentence embeddings that outperformed context-independent word mapping in sentence translation retrieval.  ...  Introduction Cross-lingual word vector models aim to embed words from multiple languages into a shared vector space to enable cross-lingual transfer and dictionary expansion (Upadhyay et al., 2016) .  ... 
doi:10.18653/v1/n19-1391 dblp:conf/naacl/AldarmakiD19 fatcat:wzw7ssx3over7h3uk6heh6siwu

A resource-light method for cross-lingual semantic textual similarity

Goran Glavaš, Marc Franco-Salvador, Simone P. Ponzetto, Paolo Rosso
2018 Knowledge-Based Systems  
Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross-lingual plagiarism detection, and show that it yields performance  ...  Abstract Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation  ...  Extrinsic evaluations of the proposed model on two different tasks: aligning parallel sentences from comparable corpora for machine translation and cross-lingual plagiarism detection.  ... 
doi:10.1016/j.knosys.2017.11.041 fatcat:3ii72eswyfaetdv6gkzfisdi6i

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity [article]

Goran Glavaš, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso
2018 arXiv   pre-print
Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross lingual plagiarism detection, and show that it yields performance  ...  Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation.  ...  Extrinsic evaluations of the proposed model on two different tasks: aligning parallel sentences from comparable corpora for machine translation and cross-lingual plagiarism detection.  ... 
arXiv:1801.06436v1 fatcat:hftd7zwksjgnjdsp2s5csbybbu

A Multi-task Approach to Learning Multilingual Representations

Karan Singla, Dogan Can, Shrikanth Narayanan
2018 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)  
Our system learns word and sentence embeddings jointly by training a multilingual skipgram model together with a cross-lingual sentence similarity model.  ...  Our architecture can transparently use both monolingual and sentence aligned bilingual corpora to learn multilingual embeddings, thus covering a vocabulary significantly larger than the vocabulary of the  ...  We hypothesize these two aspects of approach lead to more robust sentence embeddings.  ... 
doi:10.18653/v1/p18-2035 dblp:conf/acl/SinglaCN18 fatcat:dnfgrpqlbfhfbbgk7tjocccdla

Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables [article]

Zihan Liu, Jamin Shin, Yan Xu, Genta Indra Winata, Peng Xu, Andrea Madotto, Pascale Fung
2019 arXiv   pre-print
To tackle this challenge, we first use a set of very few parallel word pairs to refine the aligned cross-lingual word-level representations.  ...  We then employ a latent variable model to cope with the variance of similar sentences across different languages, which is induced by imperfect cross-lingual alignments and inherent differences in languages  ...  Left: We choose "weather-clima-อากาศ" and "evening-noche-เย็ น" from parallel sentences.  ... 
arXiv:1911.04081v1 fatcat:66wloyvprzcwlgc2rovedyz4au

Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables

Zihan Liu, Jamin Shin, Yan Xu, Genta Indra Winata, Peng Xu, Andrea Madotto, Pascale Fung
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
To tackle this challenge, we first use a set of very few parallel word pairs to refine the aligned cross-lingual wordlevel representations.  ...  We then employ a latent variable model to cope with the variance of similar sentences across different languages, which is induced by imperfect cross-lingual alignments and inherent differences in languages  ...  Left: We choose "weather-clima-อากาศ" and "evening-noche-เย็ น" from parallel sentences.  ... 
doi:10.18653/v1/d19-1129 dblp:conf/emnlp/LiuSXWXMF19 fatcat:5l7jx52axzdoxk2csrvxn47mye

Lightweight Cross-Lingual Sentence Representation Learning [article]

Zhuoyuan Mao, Prakhar Gupta, Pei Wang, Chenhui Chu, Martin Jaggi, Sadao Kurohashi
2022 arXiv   pre-print
We further augment the training task by the introduction of two computationally-lite sentence-level contrastive learning tasks to enhance the alignment of cross-lingual sentence representation space, which  ...  In this work, we introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations.  ...  Moreover, we introduce two sentence-level self-supervised learning tasks (sentence alignment and sentence similarity losses) to leverage robust parallel level supervision to better conduct the cross-lingual  ... 
arXiv:2105.13856v4 fatcat:bjfyvmg7grhwnpo576qxrvca6a

A Survey of Cross-lingual Word Embedding Models

Sebastian Ruder, Ivan Vulić, Anders Søgaard
2019 The Journal of Artificial Intelligence Research  
We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.  ...  In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions.  ...  Creating robust cross-lingual word representations with as few parallel examples as possible is thus an important research avenue.  ... 
doi:10.1613/jair.1.11640 fatcat:vwlgtzzmhfdlnlyaokx2whxgva

A Survey Of Cross-lingual Word Embedding Models [article]

Sebastian Ruder, Ivan Vulić, Anders Søgaard
2019 arXiv   pre-print
We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.  ...  In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions.  ...  Creating robust cross-lingual word representations with as few parallel examples as possible is thus an important research avenue.  ... 
arXiv:1706.04902v3 fatcat:lts6uop77zaazhzlbygqmdsama
« Previous Showing results 1 — 15 out of 2,285 results