A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Multilingual Hierarchical Attention Networks for Document Classification
[article]
2017
arXiv
pre-print
Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language. ...
To this end, we propose multilingual hierarchical attention networks for learning document structures, with shared encoders and/or shared attention mechanisms across languages, using multi-task learning ...
We would also like to thank Sebastião Miranda at Priberam for gathering the news articles from Deutsche Welle and the anonymous reviewers for their helpful suggestions. ...
arXiv:1707.00896v4
fatcat:4beegjtdgvgj3mgxylpxsl2t2e
Multilingual Hierarchical Attention Networks for Document Classification
2017
Zenodo
Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language. ...
To this end, we propose multilingual hierarchical attention networks for learning document structures, with shared encoders and/or attention mechanisms across languages, using multi-task learning and an ...
Background: Hierarchical Attention Networks for Document Classification We adopt the hierarchical attention networks for document representation proposed by Yang et al. (2016) , as displayed in Figure ...
doi:10.5281/zenodo.834306
fatcat:42qjk2f5jnaexcsij7hleyh4wq
Low-Resource Text Classification Using Domain-Adversarial Learning
[chapter]
2018
Lecture Notes in Computer Science
This paper explores the use of domain-adversarial learning as a regularizer to avoid overfitting when training domain invariant features for deep, complex neural network in low-resource and zero-resource ...
Their projection into a common space can be learnt ad-hoc at training time reaching the final performance of pretrained multilingual word-vectors. ...
Hierarchical Attention Network The most complex feature extractor explored in this work is the Hierarchical Attention Network (HAN) presented in [41] , which captures the inherent hierarchical structure ...
doi:10.1007/978-3-030-00810-9_12
fatcat:3hpgz6lzn5gknnoz33whyaihgm
Expanding the Text Classification Toolbox with Cross-Lingual Embeddings
[article]
2019
arXiv
pre-print
For all architectures, types of word embeddings and datasets, we notice a consistent gain trend in favor of multilingual joint training, especially for low-resourced languages. ...
In particular, we test the hypothesis that embeddings with context are more effective, by multi-tasking the learning of multilingual word embeddings and text classification; we explore neural architectures ...
The left-hand side finetunes multilingual embeddings using sentence alignment while the right-hand side optimizes for document classification using hierarchical bidirectional GRU attention network. ...
arXiv:1903.09878v2
fatcat:h3ho57z64bea5e3f36lfh2lymy
Development of a multilingual text mining approach for knowledge discovery in patents
2009
2009 IEEE International Conference on Systems, Man and Cybernetics
The preliminary results show that our platform framework has potential for retrieval and relatedness evaluation of multilingual patent documents. ...
These multilingual patent documents could then be mapped into the semantic vector space for evaluating their similarity by means of text clustering techniques. ...
Teappey [28] developed a platform for patent document classification and search using a back-propagation neural network. ...
doi:10.1109/icsmc.2009.5345953
dblp:conf/smc/LeeYL09
fatcat:dabjuw7frveg7cm2s2tjkngaha
GILE: A Generalized Input-Label Embedding for Text Classification
2019
Transactions of the Association for Computational Linguistics
We evaluate models on full-resource and low-or zero-resource text classification of multilingual news and biomedical text with a large label set. ...
The model consists of a joint non-linear inputlabel embedding with controllable capacity and a joint-space-dependent classification unit which is trained with cross-entropy loss to optimize classification ...
We would also like to thank our action editor, Eneko Agirre, and the anonymous reviewers for their invaluable suggestions and feedback. ...
doi:10.1162/tacl_a_00259
fatcat:7ooluwfwurfk7mw67xzzl7lmre
GILE: A Generalized Input-Label Embedding for Text Classification
[article]
2019
arXiv
pre-print
We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. ...
The model consists of a joint non-linear input-label embedding with controllable capacity and a joint-space-dependent classification unit which is trained with cross-entropy loss to optimize classification ...
We would also like to thank our action editor, Eneko Agirre, and the anonymous reviewers for their invaluable suggestions and feedback. ...
arXiv:1806.06219v3
fatcat:ymcdnmrp45etlkuja34kdimuey
Low-Resource Text Classification using Domain-Adversarial Learning
[article]
2018
arXiv
pre-print
This paper explores the use of domain-adversarial learning as a regularizer to avoid overfitting when training domain invariant features for deep, complex neural network in low-resource and zero-resource ...
Their projection into a common space can be learnt ad-hoc at training time reaching the final performance of pretrained multilingual word-vectors. ...
Attention Network. ...
arXiv:1807.05195v1
fatcat:mwww4sjohja6lhklajbi7fb6qi
A Systematic Comparison of Architectures for Document-Level Sentiment Classification
[article]
2022
arXiv
pre-print
In this work we empirically compare hierarchical models and transfer learning for document-level sentiment classification. ...
At the same time, transfer learning models based on language model pretraining have shown promise for document classification. ...
Acknowledgements This work has been carried out as part of the SANT project (Sentiment Analysis for Norwegian Text), funded by the Research Council of Norway (grant number 270908). ...
arXiv:2002.08131v2
fatcat:zxr5lrs25zd7rgjphuyze6rbp4
DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing
[article]
2021
arXiv
pre-print
Experimental results show that our model achieves state-of-the-art performance on document-level multilingual RST parsing in all sub-tasks. ...
In this work, we propose a document-level multilingual RST discourse parsing framework, which conducts EDU segmentation and discourse tree parsing jointly. ...
We thank Ai Ti Aw for the insightful discussions and Chloé Braud for sharing linguistic resources. ...
arXiv:2110.04518v1
fatcat:jl4ugdztdbe77gxxi63q4dhbja
Multilingual Neural RST Discourse Parsing
[article]
2020
arXiv
pre-print
However, the parsing tasks for other languages such as German, Dutch, and Portuguese are still challenging due to the shortage of annotated data. ...
Experiment results show that both methods are effective even with limited training data, and achieve state-of-the-art performance on cross-lingual, document-level discourse parsing on all sub-tasks. ...
Webber for insightful discussions, C. Braud for sharing linguistic resources, and the anonymous reviewers for their precious feedback to help improve and extend this piece of work. ...
arXiv:2012.01704v1
fatcat:rfx4sqosife7fdc5zn7yaxz3qu
Enriching BERT with Knowledge Graph Embeddings for Document Classification
[article]
2019
arXiv
pre-print
Compared to the standard BERT approach we achieve considerably better results for the classification task. ...
For a more coarse-grained classification using eight labels we achieve an F1- score of 87.20, while a detailed classification using 343 labels yields an F1-score of 64.70. ...
We would like to thank the anonymous reviewers for comments on an earlier version of this manuscript. ...
arXiv:1909.08402v1
fatcat:ijn5hp23rreoflh2b3ijhezsf4
A Latent Semantic Indexing-based approach to multilingual document clustering
2008
Decision Support Systems
Motivated by the significance of this demand, this study designs a Latent Semantic Indexing (LSI)-based MLDC technique capable of generating knowledge maps (i.e., document clusters) from multilingual documents ...
However, as a result of increased globalization and advances in Internet technology, an organization often maintains documents in different languages in its knowledge repositories, which necessitates multilingual ...
The multilingual ontology-based MLDC approach requires the use of a classification scheme, which contains multilingual documents as a training set for each class. ...
doi:10.1016/j.dss.2007.07.008
fatcat:dqy7qazebvb4bfw5hajlgmtmoe
Multi-granular Legal Topic Classification on Greek Legislation
[article]
2021
arXiv
pre-print
Finally, we show that cutting-edge multilingual and monolingual transformer-based models brawl on the top of the classifiers' ranking, making us question the necessity of training monolingual transfer ...
To the best of our knowledge, this is the first time the task of Greek legal text classification is considered in an open research project, while also Greek is a language with very limited NLP resources ...
Undavia et al. (2018) applied neural networks on legal document classification in a similar task, classification of legal court opinions. ...
arXiv:2109.15298v1
fatcat:if3aiggauvcm5h7uh3h6sckfte
Hierarchical Document Encoder for Parallel Corpus Mining
2019
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)
We explore using multilingual document embeddings for nearest neighbor mining of parallel data. ...
iii) a hierarchical multilingual document encoder (HiDE) that builds on our sentence-level model. ...
Acknowledgements We are grateful to the anonymous reviewers and our teammates in Deacartes and Google Translate for their valuable discussions, especially Chris Tar, Gustavo Adolfo Hernandez Abrego, and ...
doi:10.18653/v1/w19-5207
dblp:conf/wmt/GuoYSCGSSK19
fatcat:5rjdxuh5fvbnhh2pryxtfjfmgu
« Previous
Showing results 1 — 15 out of 3,718 results