Filters








12 Hits in 3.1 sec

EVALITA4ELG: Italian Benchmark Linguistic Resources, NLP Services and Tools for the ELG Platform

Viviana Patti, Valerio Basile, Cristina Bosco, Rossella Varvara, Michael Fell, Andrea Bolioli, Alessio Bosca
2020 Italian Journal of Computational Linguistics  
This paper describes the EVALITA4ELG project, whose main aim is at systematically collecting the resources released as benchmarks for this evaluation campaign, and making them easily accessible through  ...  The collection is moreover integrated with systems and baselines as a pool of web services with a common interface, deployed on a dedicated hardware infrastructure.  ...  The European Language Grid project has received funding from the European Union's Horizon 2020 Research and Innovation programme under Grant Agreement no. 825627 (ELG).  ... 
doi:10.4000/ijcol.754 fatcat:qiwz4yjaf5ecfg5k4tdkfxlcii

Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets

Óscar Apolinario-Arzube, José Antonio García-Díaz, José Medina-Moreira, Harry Luna-Aveiga, Rafael Valencia-García
2020 Mathematics  
Our results suggest that term-counting features and traditional machine learning models provide competitive results regarding automatic satire identification, slightly outperforming state-of-the-art models  ...  For this, the state-of-the-art relies on neural networks fed with word embeddings that are capable of learning interesting characteristics regarding the way humans communicate.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/math8112075 doaj:b51bf8fa62114428b4085d8d27ef4f3b fatcat:zdvkhyavbjgqxppwke4ufx3m7m

A novel COVID-19 data set and an effective deep learning approach for the de-identification of Italian medical records

Rosario Catelli, Francesco Gargiulo, Valentina Casola, Giuseppe De Pietro, Hamido Fujita, Massimo Esposito
2021 IEEE Access  
of their content in accordance with the restrictions imposed by both national and supranational privacy authorities.  ...  However, the lack of data sets in other languages has strongly limited their applicability and performance evaluation.  ...  and classification.  ... 
doi:10.1109/access.2021.3054479 pmid:34786303 pmcid:PMC8545240 fatcat:ogndkjscqzfm5eqa4tymglqpwq

SWSR: A Chinese Dataset and Lexicon for Online Sexism Detection [article]

Aiqi Jiang, Xiaohan Yang, Yang Liu, Arkaitz Zubiaga
2021 arXiv   pre-print
We conduct experiments for the three sexism classification tasks making use of state-of-the-art machine learning models.  ...  While research in the sexism detection domain is growing, most of this research focuses on English as the language and on Twitter as the platform.  ...  French Sexism direct, descriptive reporting, non-sexist no decision 12k 2020 [28] MeTwo Spanish Sexism sexist, not-sexist doubtful 3600 2020 [15] EXIST@IberLEF English Spanish  ... 
arXiv:2108.03070v1 fatcat:7pnrrr54xrd6jc6hhacglht43e

RigoBERTa: A State-of-the-Art Language Model For Spanish [article]

Alejandro Vaca Serrano, Guillem Garcia Subies, Helena Montoro Zamorano, Nuria Aldama Garcia, Doaa Samy, David Betancur Sanchez, Antonio Moreno Sandoval, Marta Guerrero Nieto, Alvaro Barbero Jimenez
2022 arXiv   pre-print
RigoBERTa performance is assessed over 13 NLU tasks in comparison with other available Spanish language models, namely, MarIA, BERTIN and BETO.  ...  RigoBERTa outperformed the three models in 10 out of the 13 tasks, achieving new "State-of-the-Art" results.  ...  Acknowledgments We would like to acknowledge the support of this project by Instituto de Ingeniería del Conocimiento, as without their faith on us and their investment, this project would not have been  ... 
arXiv:2205.10233v1 fatcat:ickgk2osgrgvllo3dlucbnrezm

Multi-task Learning of Negation and Speculation for Targeted Sentiment Classification [article]

Andrew Moore, Jeremy Barnes
2021 arXiv   pre-print
We release both the datasets and the source code at https://github.com/jerbarnes/multitask_negation_for_targeted_sentiment.  ...  In this paper, we propose a multi-task learning method to incorporate information from syntactic and semantic auxiliary tasks, including negation and speculation scope detection, to create English-language  ...  Acknowledgements This work has been carried out as part of the SANT project (Sentiment Analysis for Norwegian Text), funded by the Research Council of Norway (grant  ... 
arXiv:2010.08318v2 fatcat:z7b3fmynobarzmj344hadthupu

Gender Bias in Text: Labeled Datasets and Lexicons [article]

Jad Doughman, Wael Khreich
2022 arXiv   pre-print
However, there is a lack of gender bias datasets and lexicons for automating the detection of gender bias using supervised and unsupervised machine learning (ML) and natural language processing (NLP) techniques  ...  Therefore, the main contribution of this work is to publicly provide labeled datasets and exhaustive lexicons by collecting, annotating, and augmenting relevant sentences to facilitate the detection of  ...  in Ibereval 2018 [14] and Evalita 2020 [13] provided datasets in English, Spanish, and Italian to detect misogynistic content, to classify misogynous behaviour as well as to identify the target of  ... 
arXiv:2201.08675v1 fatcat:iv66itdd2bhyhevb5vdh5v46wy

On the Use of Parsing for Named Entity Recognition

Miguel A. Alonso, Carlos Gómez-Rodríguez, Jesús Vilares
2021 Applied Sciences  
It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits.  ...  we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.  ...  Examples from 2020 are the Spanish CAMTEMIST-NER shared task, with 23 participating teams, featured in IberLEF 2020 [97] ; or the English W-NUT-2020 Task 1, with 13 participants, featured in EMNLP 2020  ... 
doi:10.3390/app11031090 fatcat:6suu3nnvonfzbmmi3uywcts5xq

A Language Model for Misogyny Detection in Latin American Spanish Driven by Multisource Feature Extraction and Transformers

Edwin Aldana-Bobadilla, Alejandro Molina-Villegas, Yuridia Montelongo-Padilla, Ivan Lopez-Arevalo, Oscar S. Sordia
2021 Applied Sciences  
The complexity of recognizing misogyny through computer models lies in the fact that it is a subtle type of violence, it is not always explicitly aggressive, and it can even hide behind seemingly flattering  ...  This research contributes to the development of models for the automatic detection of misogynistic texts in Latin American Spanish and contributes to the design of data augmentation methodologies since  ...  Conflicts of Interest: The authors declare no conflict of interest. Appl. Sci. 2021, 11, 10467  ... 
doi:10.3390/app112110467 fatcat:hrsqz2oe5rbhhnonbjxgtz5eia

Challenging Social Media Threats using Collective Well-being Aware Recommendation Algorithms and an Educational Virtual Companion [article]

Dimitri Ognibene, Davide Taibi, Udo Kruschwitz, Rodrigo Souza Wilkens, Davinia Hernandez-Leo, Emily Theophilou, Lidia Scifo, Rene Alejandro Lobo, Francesco Lomonaco, Sabrina Eimler, H. Ulrich Hoppe, Nils Malzahn
2021 arXiv   pre-print
The full impact of current SM platform design -- both at an individual and societal level -- asks for a comprehensive evaluation and conceptual improvement.  ...  On the other hand however, some serious negative implications of SM have repeatedly been highlighted in recent years, pointing at various SM threats for society, and its teenagers in particular: from common  ...  Kruschwitz, 2020 ), detections at the spam level at Task 5 in Semeval-2021 (Pavlopoulos et al., 2021) , and generalisation to social media platforms other than those used in training at EXIST in IberLEF  ... 
arXiv:2102.04211v3 fatcat:ps2pztjcevbardpnkzrbuiia7a

Proceedings of the GermEval 2021 Workshop on the Identification of Toxic, Engaging, and Fact-Claiming Comments [article]

Julian Risch, Anke Stoll, Lena Wilms, Michael Wiegand
2021
We created ensembles of these models and investigated whether and how classification performance depends on the number of ensemble members and their composition.  ...  On out-of-sample data, our best ensemble achieved a macro-F1 score of 0.73 (for all subtasks), and F1 scores of 0.72, 0.70, and 0.76 for subtasks 1, 2, and 3, respectively.  ...  GPUs for training our deep learning models and also to our colleagues at Precog 20 for constant support and feedback.  ... 
doi:10.48415/2021/fhw5-x128 fatcat:u3fcq4x23jba7ic2a5ldcsdbna

Multi-task Learning of Negation and Speculation for Targeted Sentiment Classification

Andrew Moore, Jeremy Barnes
2021 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies   unpublished
We release both the datasets and the source code at https://github.com/ jerbarnes/multitask_negation_ for_targeted_sentiment.  ...  In this paper, we propose a multi-task learning method to incorporate information from syntactic and semantic auxiliary tasks, including negation and speculation scope detection, to create English-language  ...  Acknowledgements This work has been carried out as part of the SANT project (Sentiment Analysis for Norwegian Text), funded by the Research Council of Norway (grant number 270908).  ... 
doi:10.18653/v1/2021.naacl-main.227 fatcat:zvfxjm2f6jcytcgwcly33od6da