Filters








1,113 Hits in 5.4 sec

Multilingual and cross-lingual document classification: A meta-learning approach [article]

Niels van der Heijden, Helen Yannakoudakis, Pushkar Mishra, Ekaterina Shutova
2021 arXiv   pre-print
In this work, we propose a meta-learning approach to document classification in limited-resource setting and demonstrate its effectiveness in two different settings: few-shot, cross-lingual adaptation  ...  We conduct a systematic comparison of several meta-learning methods, investigate multiple settings in terms of data availability and show that meta-learning thrives in settings with a heterogeneous task  ...  Our contributions are as follows: 1) We propose a meta-learning approach to few-shot cross-lingual and multilingual adaptation and demonstrate its effectiveness on document classification tasks over traditional  ... 
arXiv:2101.11302v2 fatcat:36utmoigc5dx5dcjmif47av5he

Multilingual and cross-lingual document classification: A meta-learning approach

Niels van der Heijden, Helen Yannakoudakis, Pushkar Mishra, Ekaterina Shutova
2021 Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume   unpublished
In this work, we propose a meta-learning approach to document classification in limitedresource setting and demonstrate its effectiveness in two different settings: few-shot, crosslingual adaptation to  ...  We conduct a systematic comparison of several meta-learning methods, investigate multiple settings in terms of data availability and show that meta-learning thrives in settings with a heterogeneous task  ...  Our contributions are as follows: 1) We propose a meta-learning approach to few-shot cross-lingual and multilingual adaptation and demonstrate its effectiveness on document classification tasks over traditional  ... 
doi:10.18653/v1/2021.eacl-main.168 fatcat:af2q66rwovhr7pxtqbaka7n7ni

Persian Natural Language Inference: A Meta-learning approach [article]

Heydar Soudani, Mohammad Hassan Mojab, Hamid Beigy
2022 arXiv   pre-print
A powerful method of building functional natural language processing systems for low-resource languages is to combine multilingual pre-trained representations with cross-lingual transfer learning.  ...  In general, however, shared representations are learned separately, either across tasks or across languages. This paper proposes a meta-learning approach for inferring natural language in Persian.  ...  Conneau and Lample (2019) proposed two methods for learning cross-lingual language models, one using monolingual data and the other using parallel data and a new cross-lingual language model objective  ... 
arXiv:2205.08755v1 fatcat:o677jzdvzfbxhnvc47azxk7wtm

A Corpus for Multilingual Document Classification in Eight Languages [article]

Holger Schwenk, Xian Li
2018 arXiv   pre-print
Cross-lingual document classification aims at training a document classifier on resources in one language and transferring it to a different language without any additional resources.  ...  Our goal is to offer a freely available framework to evaluate cross-lingual document classification, and we hope to foster by these means, research in this important area.  ...  This type of training is not a cross-lingual approach any more. Consequently, we will refer to this method as "joint multilingual document classification".  ... 
arXiv:1805.09821v1 fatcat:ymsi6rhfcnco3gtg2mvuoxebzq

Automatic Generation of Language-Independent Features for Cross-Lingual Classification [article]

Sarai Duek, Shaul Markovitch
2018 arXiv   pre-print
The problem of learning from examples in one or more languages and classifying (categorizing) in another is called cross-lingual learning.  ...  In this work, we present a novel approach that solves the general cross-lingual text categorization problem. Our method generates, for each training document, a set of language-independent features.  ...  Related Work Most existing works on cross-lingual text classification assume the CLTC2 setup, where the training documents are written in a source language and the testing documents are written in a different  ... 
arXiv:1802.04028v1 fatcat:cjlxikzxqfaoxfbxz2jqdkudp4

Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification [article]

Alejandro Moreo, Andrea Pedrotti, Fabrizio Sebastiani
2022 arXiv   pre-print
Funnelling (Fun) is a recently proposed method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL).  ...  ) for each document, and the final classification decision is taken by a metaclassifier that uses this vector as its input.  ...  [60] demonstrates the effectiveness of meta-learning approaches to crosslingual text classification.  ... 
arXiv:2110.14764v2 fatcat:4mm6n6crtvgf3ktwmizllw77ci

Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification [article]

Andrea Esuli, Alejandro Moreo, Fabrizio Sebastiani
2019 arXiv   pre-print
Cross-lingual Text Classification (CLC) consists of automatically classifying, according to a common set C of classes, documents each written in one of a set of languages L, and doing so more accurately  ...  We tackle multilabel CLC via funnelling, a new ensemble learning method that we propose here.  ...  SOLVING CROSS-LINGUAL TEXT CLASSIFICATION VIA FUNNELLING We now describe funnelling and its application to multilabel CLC.  ... 
arXiv:1901.11459v2 fatcat:f2p3wy72wnfzdo7inxdrxg2p5a

TwiSE at SemEval-2016 Task 4: Twitter Sentiment Classification

Georgios Balikas, Massih-Reza Amini
2016 Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)  
I believe I have become not only a better researcher but also a better person due to our collaboration.  ...  I am grateful for his support and advice. Our discussions and brainstorming during all these years have deeply influenced me.  ...  , multitask classification and cross-lingual document retrieval.  ... 
doi:10.18653/v1/s16-1010 dblp:conf/semeval/BalikasA16 fatcat:w7o56n5ny5hkjgtnqghp2sdeua

Graph Convolutional Network for Swahili News Classification [article]

Alexandros Kastanos, Tyler Martin
2021 arXiv   pre-print
We follow up on this result by introducing a variant of the Text GCN model which utilises a bag of words embedding rather than a naive one-hot encoding to reduce the memory footprint of Text GCN whilst  ...  work empirically demonstrates the ability of Text Graph Convolutional Network (Text GCN) to outperform traditional natural language processing benchmarks for the task of semi-supervised Swahili news classification  ...  Acknowledgments We thank Mario Ausseloos, Jacob Deasy, and Devin Taylor for their feedback on the draft manuscript.  ... 
arXiv:2103.09325v1 fatcat:6zxdss63trfinncc2tqwxa2rwq

CROSS-LANGUAGE TEXT CLASSIFICATION WITH CONVOLUTIONAL NEURAL NETWORKS FROM SCRATCH

Musbah Zaid Enweiji, Taras Lehinevych, Аndrey Glybovets
2017 EUREKA Physics and Engineering  
Cross language classification is an important task in multilingual learning, where documents in different languages often share the same set of categories.  ...  The novel approach by using Convolutional Neural Networks for multilingual language classification is proposed in this article. It learns representation of knowledge gained from languages.  ...  Machine learning is an outstanding approach in automated text classification [1, 2] and categorization [3] .  ... 
doi:10.21303/2461-4262.2017.00304 fatcat:uaiofuqjwzhjpdghilsb3jpicu

Large-scale, Language-agnostic Discourse Classification of Tweets During COVID-19 [article]

Oguzhan Gencoglu
2020 arXiv   pre-print
For this purpose, we propose language-agnostic tweet representations to perform large-scale Twitter discourse classification with machine learning.  ...  This is mainly due to limitations of traditional topic modeling algorithms as they usually do not operate in a multilingual or cross-lingual fashion.  ...  as a supervised learning one.  ... 
arXiv:2008.00461v2 fatcat:4esdetxvwrfgtfq4c6e3vtfwla

Harvesting Comparable Corpora and Mining Them for Equivalent Bilingual Sentences Using Statistical Classification and Analogy-Based Heuristics [chapter]

Krzysztof Wołk, Emilia Rejmund, Krzysztof Marasek
2015 Lecture Notes in Computer Science  
Parallel sentences are a relatively scarce but extremely useful resource for many applications including cross-lingual retrieval and statistical machine translation.  ...  Here we propose a web crawling method for building subject-aligned comparable corpora from e.g. Wikipedia dumps and Euronews web page.  ...  Probably the most common approach is based on the retrieval of the cross-lingual information. In the second approach, source documents need to be translated using any machine translation system.  ... 
doi:10.1007/978-3-319-25252-0_46 fatcat:l6y2i4om3jddhilmfoeozdjn6e

Domain Adapted Word Embeddings for Improved Sentiment Classification

Prathusha Kameswara Sarma, Yingyu Liang, Bill Sethares
2018 Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP  
encoding algorithms for classification.  ...  This paper proposes a method to combine the breadth of generic embeddings with the specificity of domain specific embeddings.  ...  Recently, CCA has been applied to perform cross-lingual entity linking tasks (Tsai and Roth, 2016) . Most applications of CCA in NLP, as stated above, have focused on multilingual settings.  ... 
doi:10.18653/v1/w18-3407 dblp:conf/acl-deeplo/SarmaLS18 fatcat:67zr6hb2nbekhfulujkzyffmda

Large-Scale, Language-Agnostic Discourse Classification of Tweets During COVID-19

Oguzhan Gencoglu
2020 Machine Learning and Knowledge Extraction  
For this purpose, we propose language-agnostic tweet representations to perform large-scale Twitter discourse classification with machine learning.  ...  This is mainly due to limitations of traditional topic modeling algorithms as they usually do not operate in a multilingual or cross-lingual fashion.  ...  We encode both the training data and 26.8 million tweets using this deep learning approach, ending up with vectors of length 768 for each observation.  ... 
doi:10.3390/make2040032 fatcat:z5yyzynqizefxhgov7dcktpyku

Representation models for text classification

George Giannakopoulos, Petra Mavridi, Georgios Paliouras, George Papadakis, Konstantinos Tserpes
2012 Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics - WIMS '12  
In addition, we consider a novel approach that improves the performance of topic classification across all types of Web documents: namely the n-gram graphs.  ...  In this paper, we provide some insight and a preliminary study on a tripartite categorization of Web documents, based on inherent document characteristics.  ...  and Virtualisation under contract no.257774.  ... 
doi:10.1145/2254129.2254148 dblp:conf/wims/GiannakopoulosMPPT12 fatcat:u4igklrzabelroardrqoxtldcy
« Previous Showing results 1 — 15 out of 1,113 results