A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool. Unlike previous cross-lingual tasks, LAReQA tests for "strong" cross-lingual alignment, requiring semantically related cross-language pairs to be closer in representation space than unrelated same-language pairs. Building on multilingual BERT (mBERT), we study different strategies for achieving strong alignment. We find that augmenting training data via machine translationarXiv:2004.05484v1 fatcat:75ilkbbezzhdvnqm4xzultmldu