Filters








41 Hits in 5.3 sec

From Machine Translated NLI Corpus to Universal Sentence Representations in Czech

Martin Víta
2020 Position Papers of the 2020 Federated Conference on Computer Science and Information Systems  
In order to fill this gab, we propose a methodology for obtaining universal sentence embeddings in another language -arising from training INFERSENT-based sentence encoders on machine translated NLI corpus  ...  As already shown, certain deep learning architectures for NLI task -INFERSENT in particular -may be exploited for obtaining (supervised) universal sentence embeddings.  ...  Machine Translated SNLI Czech Version of SNLI Corpus In order to obtain Czech NLI annotated corpus, we chose a (machine) translation approach.  ... 
doi:10.15439/2020f212 dblp:conf/fedcsis/Vita20 fatcat:lsbblbctdbfajnsunz4vbm5kka

Non-native text analysis: A survey

SEAN MASSUNG, CHENGXIANG ZHAI
2015 Natural Language Engineering  
Even aside from massive online open courses, the number of English learners in Asia alone is in the tens of millions.  ...  Then, an introduction to native language identification follows: determining the native language of an author based on text in the second language.  ...  They train a statistical machine translation model on a parallel Chinese-English corpus to correct collocation errors in the NUCLE corpus.  ... 
doi:10.1017/s1351324915000303 fatcat:zhxt3rzmnfg7tlm6q3fqi2gkei

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond [article]

Mikel Artetxe, Holger Schwenk
2019 arXiv   pre-print
We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.  ...  We also introduce a new test set of aligned sentences in 112 languages, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low-resource languages.  ...  In this work, we are interested in universal language agnostic sentence embeddings, that is, vector representations of sentences that are general with respect to two dimensions: the input language and  ... 
arXiv:1812.10464v2 fatcat:u7ga6lnk7jazbffwiox5fdc754

SlovakBERT: Slovak Masked Language Model [article]

Matúš Pikuliak, Štefan Grivalský, Martin Konôpka, Miroslav Blšták, Martin Tamajka, Viktor Bachratý, Marián Šimko, Pavol Balážik, Michal Trnka, Filip Uhlárik
2021 arXiv   pre-print
We introduce a new Slovak masked language model called SlovakBERT in this paper. It is the first Slovak-only transformers-based model trained on a sizeable corpus.  ...  We segmented the resulting corpus into sentences and removed duplicates to get 181.6M unique sentences. In total, the final corpus has 19.35GB of text.  ...  We also experimented with Slovak-translated NLI data in a way where the model was first fine-tuned on NLI task and then the final STS fine-tuning was performed.  ... 
arXiv:2109.15254v1 fatcat:5axhetcb2ngvrnjl3n3ajuhx5e

Contextual Lensing of Universal Sentence Representations [article]

Jamie Kiros
2020 arXiv   pre-print
We break the construction of universal sentence vectors into a core, variable length, sentence matrix representation equipped with an adaptable 'lens' from which fixed-length vectors can be induced as  ...  In this work we propose Contextual Lensing, a methodology for inducing context-oriented universal sentence vectors.  ...  Acknowledgements The author would like to thank Geoff Hinton, Mohammad Norouzi and Felix Hill for their feedback.  ... 
arXiv:2002.08866v1 fatcat:7p3kk2w3ajhvbdtst3kac2fv7y

Multilingual native language identification

SHERVIN MALMASI, MARK DRAS
2015 Natural Language Engineering  
We present the first comprehensive study of Native Language Identification (NLI) applied to text written in languages other than English, using data from six languages.  ...  Most research to date has focused on English but there is a need to apply NLI to other languages, not only to gauge its applicability but also to aid in teaching research for other emerging languages.  ...  We would also like to thank Ilmari Ivaska and Kirsti Siitonen making the Finnish learner data available. We also thank Anne Ife for providing the Spanish learner corpus.  ... 
doi:10.1017/s1351324915000406 fatcat:q2m6p3lojvgbllapkhv6n56roe

Are BLEU and Meaning Representation in Opposition?

Ondřej Cífka, Ondřej Bojar
2018 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
One of possible ways of obtaining continuous-space sentence representations is by training neural machine translation (NMT) systems.  ...  The recent attention mechanism however removes the single point in the neural network from which the source sentence representation can be extracted.  ...  Acknowledgement This work has been supported by the grants 18-24210S of the Czech Science Foundation, SVV 260 453 and "Progress" Q18+Q48 of Charles University, and using language resources distributed  ... 
doi:10.18653/v1/p18-1126 dblp:conf/acl/BojarC18 fatcat:bb75bzprl5gubjq2n4fgg7ehfy

Are BLEU and Meaning Representation in Opposition? [article]

Ondřej Cífka, Ondřej Bojar
2018 arXiv   pre-print
One of possible ways of obtaining continuous-space sentence representations is by training neural machine translation (NMT) systems.  ...  The recent attention mechanism however removes the single point in the neural network from which the source sentence representation can be extracted.  ...  Acknowledgement This work has been supported by the grants 18-24210S of the Czech Science Foundation, SVV 260 453 and "Progress" Q18+Q48 of Charles University, and using language resources distributed  ... 
arXiv:1805.06536v1 fatcat:ay5c537juzbp7d36uwudkw72rm

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

Raúl Vázquez, Alessandro Raganato, Mathias Creutz, Jörg Tiedemann
2020 Computational Linguistics  
Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences.  ...  In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks.  ...  The authors gratefully acknowledge the support of the Academy of Finland through project 314062 from the ICT 2023 call on Computation, Machine Learning and Artificial Intelligence and projects 270354 and  ... 
doi:10.1162/coli_a_00377 fatcat:f2vtkibcp5fn3fkfxvqy4enf54

Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots [article]

Samson Tan, Shafiq Joty
2021 arXiv   pre-print
In multilingual communities, it is common for polyglots to code-mix when conversing with each other.  ...  Inspired by this phenomenon, we present two strong black-box adversarial attacks (one word-level, one phrase-level) for multilingual models that push their ability to handle code-mixed sentences to the  ...  For sentence-pair classification tasks like NLI, we use a per-sentence n to further increase variation.  ... 
arXiv:2103.09593v3 fatcat:epgdk4dr3zg7bn5jjqpaediwzy

Exploring Methods and Resources for Discriminating Similar Languages

Marco Lui, Ned Letcher, Oliver Adams, Long Duong, Paul Cook, Timothy Baldwin
2014 Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects  
We present the text representations and modeling techniques used, including cross-lingual POS tagging as well as fine-grained tags extracted from a deep grammar of English, and discuss additional data  ...  The Discriminating between Similar Languages (DSL) shared task at VarDial challenged participants to build an automatic language identification system to discriminate between 13 languages in 6 groups of  ...  Acknowledgments The authors wish to thank Li Wang, Rebecca Dridan and Bahar Salehi for their kind assistance with this research.  ... 
doi:10.3115/v1/w14-5315 dblp:conf/vardial/LuiLADCB14 fatcat:ci2px3whwvel5efx6sqs7tpt3i

Language Embeddings for Typology and Cross-lingual Transfer Learning [article]

Dian Yu and Taiqi He and Kenji Sagae
2021 arXiv   pre-print
We explore whether language representations that capture relationships among languages can be learned and subsequently leveraged in cross-lingual tasks without the use of parallel data.  ...  Cross-lingual language tasks typically require a substantial amount of annotated data or parallel translation data.  ...  Europarl: A Parallel Corpus for Statistical Machine Translation. In Conference Proceedings: the tenth Machine Translation Summit, pages 79-86, Phuket, Thailand. AAMT, AAMT.  ... 
arXiv:2106.02082v1 fatcat:l4sdumpujvg2dbipbq4fz6pj5a

XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation [article]

Subhabrata Mukherjee, Ahmed Hassan Awadallah, Jianfeng Gao
2021 arXiv   pre-print
In this work, we develop a new task-agnostic distillation framework XtremeDistilTransformers that leverages the advantage of task-specific methods for learning a small universal model that can be applied  ...  to arbitrary tasks and languages.  ...  Please refer to the ReadMe for the attached code for details. References  ... 
arXiv:2106.04563v2 fatcat:he6tccgf6nflhkflq2bcy7o2um

A Survey of the Usages of Deep Learning in Natural Language Processing [article]

Daniel W. Otter, Julian R. Medina, Jugal K. Kalita
2019 arXiv   pre-print
Analyzed research areas include several core linguistic processing issues in addition to a number of applications of computational linguistics.  ...  This survey provides a brief introduction to the field and a quick overview of deep learning architectures and methods.  ...  A number of translation models were constructed, all translating from English to French, German, Czech, Arabic, or Hebrew.  ... 
arXiv:1807.10854v3 fatcat:ajyv5o743naixeo5c5y6p6tg3e

Semantic Representation and Inference for NLP [article]

Dongsheng Wang
2021 arXiv   pre-print
model composed of multi-scale CNNs with different kernel sizes that learn from external sources to infer fact checking labels.  ...  In terms of improving semantic representations, we contribute a novel model that captures non-compositional semantic indicators.  ...  References Acknowledgment This project receive funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 721321".  ... 
arXiv:2106.08117v1 fatcat:qi3546wlhfd2xhqj3f776wa6km
« Previous Showing results 1 — 15 out of 41 results