A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
[article]
2019
arXiv
pre-print
We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts. Our system uses a single BiLSTM encoder with a shared BPE vocabulary for all languages, which is coupled with an auxiliary decoder and trained on publicly available parallel corpora. This enables us to learn a classifier on top of the resulting embeddings using English annotated data only, and transfer it to any of
arXiv:1812.10464v2
fatcat:u7ga6lnk7jazbffwiox5fdc754