Filters








243 Hits in 11.0 sec

A Survey Of Cross-lingual Word Embedding Models [article]

Sebastian Ruder, Ivan Vulić, Anders Søgaard
2019 arXiv   pre-print
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models  ...  In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions.  ...  Ivan's work is supported by the ERC Consolidator Grant LEXICAL: Lexical Acquisition Across Languages (no 648909).  ... 
arXiv:1706.04902v3 fatcat:lts6uop77zaazhzlbygqmdsama

Cross-Lingual Text Classification with Minimal Resources by Transferring a Sparse Teacher [article]

Giannis Karamanolakis, Daniel Hsu, Luis Gravano
2020 arXiv   pre-print
In this work, we propose a cross-lingual teacher-student method, CLTS, that generates "weak" supervision in the target language using minimal cross-lingual resources, in the form of a small number of word  ...  Existing approaches for transferring supervision across languages require expensive cross-lingual resources, such as parallel corpora, while less expensive cross-lingual representation learning approaches  ...  This material is based upon work supported by the National Science Foundation under Grant No. IIS-15-63785.  ... 
arXiv:2010.02562v1 fatcat:rggtsno3i5fcnnhzobzl6vevmq

Cross-lingual web spam classification

András Garzó, Bálint Daróczy, Tamás Kiss, Dávid Siklósi, András A. Benczúr
2013 Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion  
In this paper we overview how existing content and link based classification techniques work, how models can be "translated" from English into another language, and how language-dependent and independent  ...  While Web spam training data exists in English, we face an expensive human labeling procedure if we want to filter a Web domain in a different language.  ...  A smarter but more complex weighting method is described in [39] . • Multi-word translation, such as Monday through Friday translated into Segunda through Sexta feira, cannot be handled based on single  ... 
doi:10.1145/2487788.2488139 dblp:conf/www/GarzoDKSB13 fatcat:j4lyxtrhibce5gmxh4jlb7lzti

A Survey of Cross-lingual Word Embedding Models

Sebastian Ruder, Ivan Vulić, Anders Søgaard
2019 The Journal of Artificial Intelligence Research  
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models  ...  In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions.  ...  Ivan's work is supported by the ERC Consolidator Grant LEXICAL: Lexical Acquisition Across Languages (no 648909). Sebastian is now affiliated with DeepMind.  ... 
doi:10.1613/jair.1.11640 fatcat:vwlgtzzmhfdlnlyaokx2whxgva

A Review on Multi-Lingual Sentiment Analysis by Machine Learning Methods

Santwana Sagnika, School of Computer Engineering, Kalinga Institute of Industrial Technology Deemed to be University, Bhubaneswar, Odisha, India, Anshuman Pattanaik, Bhabani Shankar Prasad Mishra, Saroj K. Meher
2020 Journal of Engineering Science and Technology Review  
This paper attempts to provide a detailed study on the sentiment analysis methods applied on languages other than English. The tools used, pros and cons, and efficiency of all methods is covered.  ...  This task, known as sentiment analysis, is currently a prominent area of research. Sentiment analysis can be useful for businesses, data analysts and data scientists, as well as customers.  ...  [36] developed a Cross-Lingual Joint Aspect Sentiment model that simultaneously checks aspect-based opinion expression in both languages.  ... 
doi:10.25103/jestr.132.19 fatcat:aqvglobonjbh3inz3oewnvtdsu

Introduction to the Special Issue on Cross-Language Algorithms and Applications

Marta R. Costa-jussà, Srinivas Bangalore, Patrik Lambert, Lluís Màrquez, Elena Montiel-Ponsoda
2016 The Journal of Artificial Intelligence Research  
The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency  ...  development of the science of multi- and cross-lingualism.  ...  This work has been supported by the 7th Framework Program of the European Commission through  ... 
doi:10.1613/jair.5022 fatcat:h63kjmerufgkxh3qstvegklcyy

EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions [chapter]

Senja Pollak, Marko Robnik-Šikonja, Matthew Purver, Michele Boggia, Ravi Shekhar, Marko Pranjić, Salla Salmela, Ivar Krustok, Tarmo Paju, Carl-Gustav Linden, Leo Leppänen, Elaine Zosa (+17 others)
2021 Zenodo  
The collected resources were offered to participants of a hackathon organized as part of the EACL Hackashop on News Media Content Analysis and Automated Report Generation in February 2021.  ...  Moreover, it constitutes a handy source for news media industry and researchers in the fields of Natural Language Processing and Social Science.  ...  Acknowledgements This work has been supported by the European Union's Horizon 2020 research and innovation program under grant 825153 (EMBEDDIA).  ... 
doi:10.5281/zenodo.4730463 fatcat:oc2qsn7m3zdlxj5vnx3fftsw24

On the Universality of Deep Contextual Language Models [article]

Shaily Bhatt, Poonam Goyal, Sandipan Dandapat, Monojit Choudhury, Sunayana Sitaram
2021 arXiv   pre-print
Furthermore, multilingual versions of such models like XLM-R and mBERT have given promising results in zero-shot cross-lingual transfer, potentially enabling NLP applications in many under-served and under-resourced  ...  Through this survey, we lay the foundation for understanding the capabilities and limitations of massive contextual language models and help discern research gaps and directions for future work to make  ...  They observe per- resentations for all languages, especially for low formance improvement in zero-shot cross-lingual resource languages.  ... 
arXiv:2109.07140v2 fatcat:ygtqyhyhpzfjxcl4tsc4ocjsjq

A Survey of Code-switched Speech and Language Processing [article]

Sunayana Sitaram, Khyathi Raghavi Chandu, Sai Krishna Rallabandi, Alan W Black
2020 arXiv   pre-print
As code-switching data and resources are scarce, we list what is available in various code-switched language pairs with the language processing tasks they can be used for.  ...  We review code-switching research in various Speech and NLP applications, including language processing tools and end-to-end systems. We conclude with future directions and open problems in the field.  ...  The final embedding representation is obtained by feeding these word and character based embeddings through a stacked BiLSTM with residual connections.  ... 
arXiv:1904.00784v3 fatcat:r5tsg4kdnfbtnndae523c32pta

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios [article]

Michael A. Hedderich, Lukas Lange, Heike Adel, Jannik Strötgen, Dietrich Klakow
2021 arXiv   pre-print
As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings.  ...  Deep neural networks and huge language models are becoming omnipresent in natural language applications.  ...  DomBERT: Domain-oriented language model for Cross-lingual dependency parsing using code-mixed aspect-based sentiment analysis. Findings of the As- TreeBank.  ... 
arXiv:2010.12309v3 fatcat:26dwmlkmn5auha2ob2qdlrvla4

A Primer on Pretrained Multilingual Language Models [article]

Sumanth Doddapaneni, Gowtham Ramesh, Mitesh M. Khapra, Anoop Kunchukuttan, Pratyush Kumar
2021 arXiv   pre-print
variety of tasks and languages for evaluating (iii) analysing the performance of on monolingual, zero-shot cross-lingual and bilingual tasks (iv) understanding the universal language patterns (if any)  ...  learnt by and (v) augmenting the (often) limited capacity of to improve their performance on seen or even unseen languages.  ...  Investigating cross-lingual align- an unseen dialect? a case study on north african ara- ment methods for contextualized embeddings with bizi.  ... 
arXiv:2107.00676v2 fatcat:jvvt6wwitvg2lc7bmttvv3aw6m

Towards generalisable hate speech detection: a review on obstacles and solutions

Wenjie Yin, Arkaitz Zubiaga
2021 PeerJ Computer Science  
With online hate speech on the rise, its automatic detection as a natural language processing task is gaining increasing interest.  ...  the main obstacles, and then proposes directions of future research to improve generalisation in hate speech detection.  ...  The cross-lingual case Most of these studies only worked with English data.  ... 
doi:10.7717/peerj-cs.598 fatcat:lhjvmezkdbanje6kycnxet6vmm

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing [article]

Edoardo Maria Ponti, Helen O'Horan, Yevgeni Berzak, Ivan Vulić, Roi Reichart, Thierry Poibeau, Ekaterina Shutova, Anna Korhonen
2020 arXiv   pre-print
A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources.  ...  We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-employment of the typological features included in them.  ...  with real-valued multi-dimensional word embeddings and hidden states.  ... 
arXiv:1807.00914v3 fatcat:3b5vklsb6zfmrlifhtpvlxbk6q

Low-Resource Adaptation of Neural NLP Models [article]

Farhad Nooralahzadeh
2020 arXiv   pre-print
These resources are often based on language data available in large quantities, such as English newswire.  ...  To this end, we study distant supervision and sequential transfer learning in various low-resource settings.  ...  K. and Dutnais, S. T. (1997). "A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge".  ... 
arXiv:2011.04372v1 fatcat:626mbe5ba5bkdflv755o35u5pq

Learning and Evaluating Emotion Lexicons for 91 Languages [article]

Sven Buechel, Susanna Rücker, Udo Hahn
2020 arXiv   pre-print
Emotion lexicons describe the affective meaning of words and thus constitute a centerpiece for advanced sentiment and emotion analysis.  ...  Our approach requires nothing but a source language emotion lexicon, a bilingual word translation model, and a target language embedding model.  ...  This work was partially funded by the German Federal Ministry for Economic Affairs and Energy (funding line "Big Data in der makroökonomischen Analyse" [Big data in macroeconomic analysis]; Fachlos 2;  ... 
arXiv:2005.05672v1 fatcat:sknca2vkgban7ioyuucxg2ek4u
« Previous Showing results 1 — 15 out of 243 results