Filters








12,331 Hits in 6.3 sec

Managing Bias in Human-Annotated Data: Moving Beyond Bias Removal [article]

Gianluca Demartini, Kevin Roitero, Stefano Mizzaro
2021 arXiv   pre-print
In this position paper, we instead argue that bias is not something that should necessarily be removed in all cases, and the attention and effort should shift from bias removal to the identification, measurement  ...  Due to the widespread use of data-powered systems in our everyday lives, the notions of bias and fairness gained significant attention among researchers and practitioners, in both industry and academia  ...  We argue that bias is part of human nature, and that it should be managed rather than removed.  ... 
arXiv:2110.13504v1 fatcat:bgjgqnllxfeuxbrw2ifgnrmdrq

Domain Adaptation for Commitment Detection in Email

Hosein Azarbonyad, Robert Sim, Ryen W. White
2019 Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining - WSDM '19  
We show that transfer learning can help remove domain bias to obtain models with less domain dependence.  ...  in a timely manner.  ...  [26, 28] show that the annotation of commitments and requests in email is challenging, even for humans.  ... 
doi:10.1145/3289600.3290984 dblp:conf/wsdm/AzarbonyadSW19 fatcat:7ooa7am5frejnecffmqnzxdnym

Towards Accuracy-Fairness Paradox: Adversarial Example-based Data Augmentation for Visual Debiasing [article]

Yi Zhang, Jitao Sang
2020 arXiv   pre-print
Our data analysis on facial attribute recognition demonstrates (1) the attribution of model bias from imbalanced training data distribution and (2) the potential of adversarial examples in balancing data  ...  The generated adversarial examples supplement the target task training dataset via balancing the distribution over bias variables in an online fashion.  ...  The goal is to remove the modelâĂŹs gender bias in facial attribute classification.  ... 
arXiv:2007.13632v2 fatcat:3zbxitm6bbbgllkvmk3lfmwfoe

Managing scientific data

Anastasia Ailamaki
2011 Proceedings of the 2011 international conference on Management of data - SIGMOD '11  
Still, the data-management community aspires to generalpurpose scientific data management.  ...  Proposed solutions also promise to achieve efficient management for almost any other kind of data.  ...  Many operations can be applied in a pipeline manner as data is generated or move around.  ... 
doi:10.1145/1989323.1989433 dblp:conf/sigmod/Ailamaki11 fatcat:3cxviarugnfoldnqz3csgjczqu

Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

Alexandra Olteanu, Carlos Castillo, Fernando Diaz, Emre Kıcıman
2019 Frontiers in Big Data  
Many academics and practitioners have warned against the naïve usage of social data. There are biases and inaccuracies occurring at the source of the data, but also introduced during processing.  ...  We identify a variety of menaces in the practices around social data use, and organize them in a framework that helps to identify them.  ...  In this paper, we use the term mainly in its more statistical sense to refer to biases in social data and social data analyses (see our working definition of data bias in section 3.1).  ... 
doi:10.3389/fdata.2019.00013 pmid:33693336 pmcid:PMC7931947 fatcat:yhvqij6yyvhjjcwt7u4h6oq6au

Managing scientific data

Anastasia Ailamaki, Verena Kantere, Debabrata Dash
2010 Communications of the ACM  
Still, the data-management community aspires to generalpurpose scientific data management.  ...  Proposed solutions also promise to achieve efficient management for almost any other kind of data.  ...  Many operations can be applied in a pipeline manner as data is generated or move around.  ... 
doi:10.1145/1743546.1743568 fatcat:vw57d23aorchtntng6jlrccs6y

Spatiotemporal Data Mining: A Survey on Challenges and Open Problems [article]

Ali Hamdi, Khaled Shaban, Abdelkarim Erradi, Amr Mohamed, Shakila Khan Rumi, Flora Salim
2021 arXiv   pre-print
Specifically, we investigate the challenging issues in regards to spatiotemporal relationships, interdisciplinarity, discretisation, and data characteristics.  ...  Moreover, we discuss the limitations in the literature and open research problems related to spatiotemporal data representations, modelling and visualisation, and comprehensiveness of approaches.  ...  ., does not require human annotation [267, 153, 104] or semi-supervised, i.e., requires to annotate object in the first frame only [33, 41] .  ... 
arXiv:2103.17128v1 fatcat:ci5pt5bytndr5inolznjsaizpi

Big Data Ethics: A Life Cycle Perspective

Simon Vydra, Andrei Poama, Sarah Giest, Alex Ingrams, Bram Klievink
2021 Erasmus Law Review  
Use and combining of structured (traditional) and less structured or unstructured (nontraditional) data in analysis activities; 3. Use of incoming data streams in real time or near real time; 4.  ...  Innovative use of existing datasets and/or data sources for new and radically different applications than the data were gathered for or spring from.  ...  Non-expert human annotators are slightly less accurate than COMPAS (62.8%) individually but more accurate than COMPAS when aggregating multiple annotators together (67%).  ... 
doi:10.5553/elr.000190 fatcat:74dl3c4psbh3nmmnjzfe3q6bn4

Unsupervised Semantic Mapping for Healthcare Data Storage Schema

Fahad Ahmed Satti, Musarrat Hussain, Jamil Hussain, Syed Imran Ali, Taqdir Ali, Hafiz Syed Muhammad Bilal, Taechoong Chung, Sungyoung Lee
2021 IEEE Access  
Even before annotation, the dataset is bias in favour of unrelated attributes.  ...  As a result, the annotations were kept anonymous so as not to induce any bias.  ... 
doi:10.1109/access.2021.3100686 fatcat:oesdl342w5bv3psmemokoocvwm

Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

Alexandra Olteanu, Carlos Castillo, Fernando Diaz, Emre Kiciman
2016 Social Science Research Network  
Many academics and practitioners have warned against the naïve usage of social data. There are biases and inaccuracies at the source of the data, but also introduced during processing.  ...  We present a framework for identifying a broad range of menaces in the research and practices around social data.  ...  By data bias we mean a systematic distortion in the data.  ... 
doi:10.2139/ssrn.2886526 fatcat:urp4unvmsbgnpfsg46g75ywjxy

Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing

Riyue Bao, Lei Huang, Jorge Andrade, Wei Tan, Warren A. Kibbe, Hongmei Jiang, Gang Feng
2014 Cancer Informatics  
In addition, we briefly discuss the current status and solutions for big data management, analysis, and summarization in the field of bioinformatics. keywords: big data, InDel, next generation sequencing  ...  , annotation, and prioritization).  ...  We will also discuss challenges in large-scale NGS data analysis and management.  ... 
doi:10.4137/cin.s13779 pmid:25288881 pmcid:PMC4179624 fatcat:wrlfacy7hzfu7pllgdyifnjphy

Mining Social Media Data for Biomedical Signals and Health-Related Behavior

Rion Brattig Correia, Ian B. Wood, Johan Bollen, Luis M. Rocha
2020 Annual Review of Biomedical Data Science  
Here we review recent work in mining social media for biomedical, epidemiological, and social phenomena information relevant to the multilevel complexity of human health.  ...  From cohort-level discussions of a condition to population-level analyses of sentiment, social media have provided scientists with unprecedented amounts of data to study human behavior associated with  ...  of publicly available annotated data.  ... 
doi:10.1146/annurev-biodatasci-030320-040844 pmid:32550337 pmcid:PMC7299233 fatcat:ae52gyu4rjebdd3s4mj75lafky

Interactive Data Analytics for the Humanities [chapter]

Iryna Gurevych, Christian M. Meyer, Carsten Binnig, Johannes Fürnkranz, Kristian Kersting, Stefan Roth, Edwin Simpson
2018 Lecture Notes in Computer Science  
In the envisioned interactive systems, human users not only provide annotations to a machine learner, but train a model by using the system and demonstrating the task.  ...  Our vision links natural language processing research with recent advances in machine learning, computer vision, and data management systems, as realizing this vision relies on combining expertise from  ...  data management systems.  ... 
doi:10.1007/978-3-319-77113-7_41 fatcat:lyemn3e2xvdxfgqlavettn2dji

Marble

Joyce C. Ho, Joydeep Ghosh, Jimeng Sun
2014 Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14  
making, prognosis, and patient management.  ...  Thus it can potentially be used to rapidly characterize, predict, and manage a large number of diseases, thereby promising a novel, data-driven solution that can benefit very large segments of the population  ...  This research is supported by the Schlumberger Centennial Chair in Engineering; Army Research Office under grant W911NF-11-1-0258; and Department of Defense award under award number 60036907.  ... 
doi:10.1145/2623330.2623658 dblp:conf/kdd/HoGS14 fatcat:l4zcy4jmwjc6pgvua4n2ndxqcu

On the data set's ruins

Nicolas Malevé
2020 AI & Society: The Journal of Human-Centred Systems and Machine Intelligence  
Today, a significant amount of computer vision algorithms rely on techniques of machine learning which require large amounts of data assembled in collections, or named data sets.  ...  To build these data sets a large population of precarious workers label and classify photographs around the clock at high speed.  ...  If the work of curating data sets and their annotation is central, so is the management of the populations of workers involved.  ... 
doi:10.1007/s00146-020-01093-w fatcat:uevktqttdfezlbtuakgm7hkw4a
« Previous Showing results 1 — 15 out of 12,331 results