A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Robust Cross-lingual Embeddings from Parallel Sentences
[article]
2020
arXiv
pre-print
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word and sentence representations. ...
Recent advances in cross-lingual word embeddings have primarily relied on mapping-based methods, which project pretrained word embeddings from different languages into a shared space through a linear transformation ...
Acknowledgments We acknowledge funding from the Innosuisse ADA grant. ...
arXiv:1912.12481v2
fatcat:onah22qti5gmrghnyi7o6h4pua
Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training
[article]
2021
arXiv
pre-print
Especially, learning alignments in the multilingual embedding space usually requires sentence-level or word-level parallel corpora, which are expensive to be obtained for low-resource languages. ...
The improvement is more significant in the generalized cross-lingual transfer setting, where the pair of input sentences belong to two different languages. ...
Note that due to the parallel nature of PAWS-X and XNLI dataset 5 , we can pair up sentences from two different languages. ...
arXiv:2104.08645v2
fatcat:qdzq7alqr5farkwuedxhsekf3u
Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining
[article]
2021
arXiv
pre-print
We first produce a synthetic parallel corpus using unsupervised machine translation, and use it to fine-tune a pretrained cross-lingual masked language model (XLM) to derive the multilingual sentence representations ...
Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages. ...
We propose a method to further align representations from such models into the cross-lingual space and use them to derive sentence embeddings. ...
arXiv:2105.10419v1
fatcat:lnc3qfaprngihhse2ras5npe3y
Context-Aware Cross-Lingual Mapping
[article]
2019
arXiv
pre-print
We also implement cross-lingual mapping of deep contextualized word embeddings using parallel sentences with word alignments. ...
In our experiments, both approaches resulted in cross-lingual sentence embeddings that outperformed context-independent word mapping in sentence translation retrieval. ...
Introduction Cross-lingual word vector models aim to embed words from multiple languages into a shared vector space to enable cross-lingual transfer and dictionary expansion (Upadhyay et al., 2016) . ...
arXiv:1903.03243v2
fatcat:bgy3m2lhurdrjkhcojmgiu5pjm
Scalable Cross-Lingual Transfer of Neural Sentence Embeddings
2019
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*
Our results support representation transfer as a scalable approach for modular cross-lingual alignment of neural sentence embeddings, where we observe better performance compared to joint models in intrinsic ...
We evaluate three alignment frameworks applied to these models: joint modeling, representation transfer learning, and sentence mapping, using parallel text to guide the alignment. ...
The latter introduces textual noise on the input sentence to make the embeddings more robust. ...
doi:10.18653/v1/s19-1006
dblp:conf/starsem/AldarmakiD19
fatcat:oxgda73bpfcsphfh5vdpuhkeiy
Scalable Cross-Lingual Transfer of Neural Sentence Embeddings
[article]
2019
arXiv
pre-print
We develop and investigate several cross-lingual alignment approaches for neural sentence embedding models, such as the supervised inference classifier, InferSent, and sequential encoder-decoder models ...
Our results support representation transfer as a scalable approach for modular cross-lingual alignment of neural sentence embeddings, where we observe better performance compared to joint models in intrinsic ...
The latter introduces textual noise on the input sentence to make the embeddings more robust. ...
arXiv:1904.05542v1
fatcat:cvmxdq7kvzbhphsanx5rlq6rbi
Context-Aware Cross-Lingual Mapping
2019
Proceedings of the 2019 Conference of the North
We also implement cross-lingual mapping of deep contextualized word embeddings using parallel sentences with word alignments. ...
In our experiments, both approaches resulted in cross-lingual sentence embeddings that outperformed context-independent word mapping in sentence translation retrieval. ...
Introduction Cross-lingual word vector models aim to embed words from multiple languages into a shared vector space to enable cross-lingual transfer and dictionary expansion (Upadhyay et al., 2016) . ...
doi:10.18653/v1/n19-1391
dblp:conf/naacl/AldarmakiD19
fatcat:wzw7ssx3over7h3uk6heh6siwu
A resource-light method for cross-lingual semantic textual similarity
2018
Knowledge-Based Systems
Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross-lingual plagiarism detection, and show that it yields performance ...
Abstract Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation ...
Extrinsic evaluations of the proposed model on two different tasks: aligning parallel sentences from comparable corpora for machine translation and cross-lingual plagiarism detection. ...
doi:10.1016/j.knosys.2017.11.041
fatcat:3ii72eswyfaetdv6gkzfisdi6i
A Resource-Light Method for Cross-Lingual Semantic Textual Similarity
[article]
2018
arXiv
pre-print
Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross lingual plagiarism detection, and show that it yields performance ...
Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. ...
Extrinsic evaluations of the proposed model on two different tasks: aligning parallel sentences from comparable corpora for machine translation and cross-lingual plagiarism detection. ...
arXiv:1801.06436v1
fatcat:hftd7zwksjgnjdsp2s5csbybbu
A Multi-task Approach to Learning Multilingual Representations
2018
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Our system learns word and sentence embeddings jointly by training a multilingual skipgram model together with a cross-lingual sentence similarity model. ...
Our architecture can transparently use both monolingual and sentence aligned bilingual corpora to learn multilingual embeddings, thus covering a vocabulary significantly larger than the vocabulary of the ...
We hypothesize these two aspects of approach lead to more robust sentence embeddings. ...
doi:10.18653/v1/p18-2035
dblp:conf/acl/SinglaCN18
fatcat:dnfgrpqlbfhfbbgk7tjocccdla
Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables
[article]
2019
arXiv
pre-print
To tackle this challenge, we first use a set of very few parallel word pairs to refine the aligned cross-lingual word-level representations. ...
We then employ a latent variable model to cope with the variance of similar sentences across different languages, which is induced by imperfect cross-lingual alignments and inherent differences in languages ...
Left: We choose "weather-clima-อากาศ" and "evening-noche-เย็ น" from parallel sentences. ...
arXiv:1911.04081v1
fatcat:66wloyvprzcwlgc2rovedyz4au
Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables
2019
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
To tackle this challenge, we first use a set of very few parallel word pairs to refine the aligned cross-lingual wordlevel representations. ...
We then employ a latent variable model to cope with the variance of similar sentences across different languages, which is induced by imperfect cross-lingual alignments and inherent differences in languages ...
Left: We choose "weather-clima-อากาศ" and "evening-noche-เย็ น" from parallel sentences. ...
doi:10.18653/v1/d19-1129
dblp:conf/emnlp/LiuSXWXMF19
fatcat:5l7jx52axzdoxk2csrvxn47mye
Lightweight Cross-Lingual Sentence Representation Learning
[article]
2022
arXiv
pre-print
We further augment the training task by the introduction of two computationally-lite sentence-level contrastive learning tasks to enhance the alignment of cross-lingual sentence representation space, which ...
In this work, we introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations. ...
Moreover, we introduce two sentence-level self-supervised learning tasks (sentence alignment and sentence similarity losses) to leverage robust parallel level supervision to better conduct the cross-lingual ...
arXiv:2105.13856v4
fatcat:bjfyvmg7grhwnpo576qxrvca6a
A Survey of Cross-lingual Word Embedding Models
2019
The Journal of Artificial Intelligence Research
We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons. ...
In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. ...
Creating robust cross-lingual word representations with as few parallel examples as possible is thus an important research avenue. ...
doi:10.1613/jair.1.11640
fatcat:vwlgtzzmhfdlnlyaokx2whxgva
A Survey Of Cross-lingual Word Embedding Models
[article]
2019
arXiv
pre-print
We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons. ...
In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. ...
Creating robust cross-lingual word representations with as few parallel examples as possible is thus an important research avenue. ...
arXiv:1706.04902v3
fatcat:lts6uop77zaazhzlbygqmdsama
« Previous
Showing results 1 — 15 out of 2,285 results