Filters








16 Hits in 0.8 sec

Managing Bias in Human-Annotated Data: Moving Beyond Bias Removal [article]

Gianluca Demartini, Kevin Roitero, Stefano Mizzaro
2021 arXiv   pre-print
Due to the widespread use of data-powered systems in our everyday lives, the notions of bias and fairness gained significant attention among researchers and practitioners, in both industry and academia. Such issues typically emerge from the data, which comes with varying levels of quality, used to train systems. With the commercialization and employment of such systems that are sometimes delegated to make life-changing decisions, a significant effort is being made towards the identification and
more » ... identification and removal of possible sources of bias that may surface to the final end-user. In this position paper, we instead argue that bias is not something that should necessarily be removed in all cases, and the attention and effort should shift from bias removal to the identification, measurement, indexing, surfacing, and adjustment of bias, which we name bias management. We argue that if correctly managed, bias can be a resource that can be made transparent to the the users and empower them to make informed choices about their experience with the system.
arXiv:2110.13504v1 fatcat:bgjgqnllxfeuxbrw2ifgnrmdrq

DiLBERT: Cheap Embeddings for Disease Related Medical NLP

Kevin Roitero, Beatrice Portelli, Mihai Horia Popescu, Vincenzo Della Mea
2021 IEEE Access  
Electronic Health Records include health-related information, among which there is text mentioning health conditions and diagnoses. Usually, text is also coded using appropriate terminologies and classifications. The act of coding is time consuming and prone to mistakes. Consequently, there is increasing demand for clinical text mining tools to help coding. In last few years Natural Language Processing (NLP) models has been shown to be effective in sentence-level tasks. Taking advantage from
more » ... g advantage from the transfer learning capabilities of those models, a number of biomedicine and health specific models have been also developed. However, also biomedical models can be seen as too general for some specific area like diagnostic expressions. In this paper, we describe a BERT model specialized on tasks related to diagnoses and health conditions. To obtain a disease-related language model, we created a pre-training corpora starting from ICD-11 entities, and enriched them with documents selected by querying PubMed and Wikipedia with entity names. Finetuning has been carried out towards three downstream tasks on two different datasets. Results show that our model, besides being trained on a much smaller corpora than state-of-the-art algorithms, leads to comparable or higher accuracy scores on all the considered tasks, in particular 97.53% accuracy on death certificate coding, and 81.32% on clinical document coding, which are both slightly higher than other models. To summarize the practical implications of our work, we pre-trained and fine-tuned a domain specific BERT model on a small corpora, with comparable or better performance than state-of-the-art models. This approach may also simplify the development of models for languages different from English, due to the minor quantity of data needed for training.
doi:10.1109/access.2021.3131386 fatcat:dckasazw4bcq3b4mnc3ex56wda

Crowdsourcing Truthfulness: The Impact of Judgment Scale and Assessor Bias [chapter]

David La Barbera, Kevin Roitero, Gianluca Demartini, Stefano Mizzaro, Damiano Spina
2020 Lecture Notes in Computer Science  
Roitero et al. [10] use crowdsourcing to study user perception of fake news statements.  ... 
doi:10.1007/978-3-030-45442-5_26 fatcat:jjpzxi566vcznjhbny6s3vqv4q

Towards Stochastic Simulations of Relevance Profiles

Kevin Roitero, Andrea Brunello, Julián Urbano, Stefano Mizzaro
2019 Proceedings of the 28th ACM International Conference on Information and Knowledge Management - CIKM '19  
Recently proposed methods allow the generation of simulated scores representing the values of an effectiveness metric, but they do not investigate the generation of the actual lists of retrieved documents. In this paper we address this limitation: we present an approach that exploits an evolutionary algorithm and, given a metric score, creates a simulated relevance profile (i.e., a ranked list of relevance values) that produces that score. We show how the simulated relevance profiles are
more » ... profiles are realistic under various analyses.
doi:10.1145/3357384.3358123 dblp:conf/cikm/RoiteroBUM19 fatcat:iuitr5r5x5bi5ljqeqa2kxsq2q

All Those Wasted Hours

Lei Han, Kevin Roitero, Ujwal Gadiraju, Cristina Sarasua, Alessandro Checco, Eddy Maddalena, Gianluca Demartini
2019 Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining - WSDM '19  
Crowdsourcing has become a standard methodology to collect manually annotated data such as relevance judgments at scale. On crowdsourcing platforms like Amazon MTurk or FigureEight, crowd workers select tasks to work on based on diferent dimensions such as task reward and requester reputation. Requesters then receive the judgments of workers who self-selected into the tasks and completed them successfully. Several crowd workers, however, preview tasks, begin working on them, reaching varying
more » ... reaching varying stages of task completion without inally submitting their work. Such behavior results in unrewarded efort which remains invisible to requesters. In this paper, we conduct the irst investigation into the phenomenon of task abandonment, the act of workers previewing or beginning a task and deciding not to complete it. We follow a threefold methodology which includes 1) investigating the prevalence and causes of task abandonment by means of a survey over different crowdsourcing platforms, 2) data-driven analyses of logs collected during a large-scale relevance judgment experiment, and 3) controlled experiments measuring the efect of diferent dimensions on abandonment. Our results show that task abandonment is a widely spread phenomenon. Apart from accounting for a considerable amount of wasted human efort, this bears important implications on the hourly wages of workers as they are not rewarded for tasks that they do not complete. We also show how task abandonment may have strong implications on the use of collected data (for example, on the evaluation of IR systems).
doi:10.1145/3289600.3291035 dblp:conf/wsdm/HanRGSCMD19 fatcat:qpgkncqpyrbzlm4hystk6vndgm

Detection of HER2 from Haematoxylin-Eosin Slides Through a Cascade of Deep Learning Classifiers via Multi-Instance Learning

David La Barbera, Kevin Roitero, Vincenzo Della Mea
2020 Journal of Imaging  
Breast cancer is the most frequently diagnosed cancer in woman. The correct identification of the HER2 receptor is a matter of major importance when dealing with breast cancer: an over-expression of HER2 is associated with aggressive clinical behaviour; moreover, HER2 targeted therapy results in a significant improvement in the overall survival rate. In this work, we employ a pipeline based on a cascade of deep neural network classifiers and multi-instance learning to detect the presence of
more » ... the presence of HER2 from Haematoxylin–Eosin slides, which partly mimics the pathologist's behaviour by first recognizing cancer and then evaluating HER2. Our results show that the proposed system presents a good overall effectiveness. Furthermore, the system design is prone to further improvements that can be easily deployed in order to increase the effectiveness score.
doi:10.3390/jimaging6090082 pmid:34460739 fatcat:amr6cho2dfghdldhut7umxkeky

Crowd Worker Strategies in Relevance Judgment Tasks

Lei Han, Eddy Maddalena, Alessandro Checco, Cristina Sarasua, Ujwal Gadiraju, Kevin Roitero, Gianluca Demartini
2020 Proceedings of the 13th International Conference on Web Search and Data Mining  
Crowdsourcing is a popular technique to collect large amounts of human-generated labels, such as relevance judgments used to create information retrieval (IR) evaluation collections. Previous research has shown how collecting high quality labels from a crowdsourcing platform can be challenging. Existing quality assurance techniques focus on answer aggregation or on the use of gold questions where ground-truth data allows to check for the quality of the responses. In this paper, we present
more » ... r, we present qualitative and quantitative results, revealing how different crowd workers adopt different work strategies to complete relevance judgment tasks efficiently and their consequent impact on quality. We delve into the techniques and tools that highly experienced crowd workers use to be more efficient in completing crowdsourcing micro-tasks. To this end, we use both qualitative results from worker interviews and surveys, as well as the results of a data-driven study of behavioral log data (i.e., clicks, keystrokes and keyboard shortcuts) collected from crowd workers performing relevance judgment tasks. Our results highlight the presence of frequently used shortcut patterns that can speed-up task completion, thus increasing the hourly wage of efficient workers. We observe how crowd work experiences result in different types of working strategies, productivity levels, quality and diversity of the crowdsourced judgments.
doi:10.1145/3336191.3371857 dblp:conf/wsdm/0003MCSGRD20 fatcat:csumgnwrprhynn64dgz6rkagsq

The Many Dimensions of Truthfulness: Crowdsourcing Misinformation Assessments on a Multidimensional Scale [article]

Michael Soprano and Kevin Roitero and David La Barbera and Davide Ceolin and Damiano Spina and Stefano Mizzaro and Gianluca Demartini
2021 arXiv   pre-print
Roitero et al. [48] followed the same approach of Roitero et al. [47] to study if the crowd can reliably assess misinformation statements related to the COVID-19 pandemic.  ...  Barack Obama 2009 true "Under this government, the tax to GDP ratio has, in the period weve been in office, [been] an average of 22.7 per cent" Kevin Rudd 2013 positive / Checks Out RQ1  ... 
arXiv:2108.01222v1 fatcat:26prpndntffkfmjrfoabd3wz5m

Can The Crowd Identify Misinformation Objectively? The Effects of Judgment Scale and Assessor's Background [article]

Kevin Roitero, Michael Soprano, Shaoyang Fan, Damiano Spina, Stefano Mizzaro, Gianluca Demartini
2020 arXiv   pre-print
[30] , Roitero et al. [40] , and Roitero et al. [41] . Also in this case, this behavior is consistent when considering separately PolitiFact and ABC documents (not shown).  ...  Roitero et al. [40] and La Barbera et al. [28] recently studied how users perceive fake news statements.  ... 
arXiv:2005.06915v1 fatcat:u3utoxzp5ncx3f5dvrvk2zsfy4

Can the Crowd Judge Truthfulness? A Longitudinal Study on Recent Misinformation about COVID-19 [article]

Kevin Roitero and Michael Soprano and Beatrice Portelli and Massimiliano De Luise and Damiano Spina and Vincenzo Della Mea and Giuseppe Serra and Stefano Mizzaro and Gianluca Demartini
2021 arXiv   pre-print
Roitero et al.  ...  Roitero et al.  ... 
arXiv:2107.11755v1 fatcat:pxyasrohpvdevcnbfx42utklfy

The COVID-19 Infodemic: Can the Crowd Judge Recent Misinformation Objectively? [article]

Kevin Roitero, Michael Soprano, Beatrice Portelli, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, Gianluca Demartini
2020 pre-print
Roitero et al.  ...  Recent work by Roitero et al.  ... 
doi:10.1145/3340531.3412048 arXiv:2008.05701v1 fatcat:rq47sp2qgnczzkemdbovzuqlqq

Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the Society [article]

Firoj Alam, Shaden Shaar, Fahim Dalvi, Hassan Sajjad, Alex Nikolov, Hamdy Mubarak, Giovanni Da San Martino, Ahmed Abdelali, Nadir Durrani, Kareem Darwish, Abdulaziz Al-Homaid, Wajdi Zaghouani (+5 others)
2021 arXiv   pre-print
Kevin Roitero, Michael Soprano, Shaoyang Fan, Dami- ano Spina, Stefano Mizzaro, and Gianluca Demar- tini. 2020. Can the crowd identify misinformation objectively?  ...  Kevin R Canini, Bongwon Suh, and Peter L Pirolli. 2011. Finding credible information sources in so- cial networks based on content and social structure.  ... 
arXiv:2005.00033v5 fatcat:pqmx6nl22jay7kvoiuunq4hrp4

Generating Fact Checking Briefs

Angela Fan, Aleksandra Piktus, Fabio Petroni, Guillaume Wenzek, Marzieh Saeidi, Andreas Vlachos, Antoine Bordes, Sebastian Riedel
2020 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)   unpublished
Kevin Roitero, Michael Soprano, Shaoyang Fan, Dami- ano Spina, Stefano Mizzaro, and Gianluca Demar- tini. 2020. Can the crowd identify misinformation objectively?  ...  Volunteers on the other hand are not considered accurate enough; with access to a search engine, Roitero et al. (2020) report crowdsourced fact check accuracies of around 58%.  ... 
doi:10.18653/v1/2020.emnlp-main.580 fatcat:qhc5fujdyfhrvchjttt6anpnou

Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis [article]

Saif M. Mohammad
2021 arXiv   pre-print
Bertilsdotter Rosqvist, Hanna, Marianthi Checco, Alessandro, Kevin Roitero, Eddy Kourti, David Jackson-Perry, Charlotte Maddalena, Stefano Mizzaro, and Brownlow, Kirsty Fletcher  ...  Vivien Wong, Lyle Ungar, Daniel Polsky, Psychological language on Twitter Kevin G Volpp, and Raina Merchant. 2019.  ... 
arXiv:2109.08256v2 fatcat:t3scrly2tjfefevnmd2iegqlda

Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence

Tal Schuster, Adam Fisch, Regina Barzilay
2021 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies   unpublished
Kevin Roitero, Michael Soprano, Beatrice Portelli, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, and Gianluca Demartini. 2020.  ... 
doi:10.18653/v1/2021.naacl-main.52 fatcat:7d4d6efpubgv3n44rmbpdqtmee
« Previous Showing results 1 — 15 out of 16 results