A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Classification algorithms aim to predict an unknown label (e.g., a quality class) for a new instance (e.g., a product). Therefore, training samples (instances and labels) are used to deduct classification hypotheses. Often, it is relatively easy to capture instances but the acquisition of the corresponding labels remain difficult or expensive. Active learning algorithms select the most beneficial instances to be labeled to reduce cost. In research, this labeling procedure is simulated andarXiv:1901.10338v1 fatcat:izn3mkks5fakxjnjwiu2h5vnvu
more »... ore a ground truth is available. But during deployment, active learning is a one-shot problem and an evaluation set is not available. Hence, it is not possible to reliably estimate the performance of the classification system during learning and it is difficult to decide when the system fulfills the quality requirements (stopping criteria). In this article, we formalize the task and review existing strategies to assess the performance of an actively trained classifier during training. Furthermore, we identified three major challenges: 1)~to derive a performance distribution, 2)~to preserve representativeness of the labeled subset, and 3) to correct against sampling bias induced by an intelligent selection strategy. In a qualitative analysis, we evaluate different existing approaches and show that none of them reliably estimates active learning performance stating a major challenge for future research for such systems. All plots and experiments are provided in a Jupyter notebook that is available for download.
Gathering labeled data to train well-performing machine learning models is one of the critical challenges in many applications. Active learning aims at reducing the labeling costs by an efficient and effective allocation of costly labeling resources. In this article, we propose a decision-theoretic selection strategy that (1) directly optimizes the gain in misclassification error, and (2) uses a Bayesian approach by introducing a conjugate prior distribution to determine the class posterior toarXiv:2006.01732v1 fatcat:qjuo3lbvtvcp3pl6ispfhe33uu
more »... eal with uncertainties. By reformulating existing selection strategies within our proposed model, we can explain which aspects are not covered in current state-of-the-art and why this leads to the superior performance of our approach. Extensive experiments on a large variety of datasets and different kernels validate our claims.
AbstractGathering labeled data to train well-performing machine learning models is one of the critical challenges in many applications. Active learning aims at reducing the labeling costs by an efficient and effective allocation of costly labeling resources. In this article, we propose a decision-theoretic selection strategy that (1) directly optimizes the gain in misclassification error, and (2) uses a Bayesian approach by introducing a conjugate prior distribution to determine the classdoi:10.1007/s10994-021-05986-9 fatcat:uihdkyzrdrgb3pqbmqfuag5bdu
more »... ior to deal with uncertainties. By reformulating existing selection strategies within our proposed model, we can explain which aspects are not covered in current state-of-the-art and why this leads to the superior performance of our approach. Extensive experiments on a large variety of datasets and different kernels validate our claims.
Denis Huseljic received his B.Sc. and M.Sc. degrees in computer science from the Univ. of Kassel, Germany. Currently, he is also pursuing his Ph.D. degree in computer science there. ... Appendices of A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification Marek Herde , Denis Huseljic , Bernhard Sick , ...arXiv:2109.11301v1 fatcat:kmrh7p6x4fekrhz2rcakmzqkia
Pool-based active learning (AL) aims to optimize the annotation process (i.e., labeling) as the acquisition of annotations is often time-consuming and therefore expensive. For this purpose, an AL strategy queries annotations intelligently from annotators to train a high-performance classification model at a low annotation cost. Traditional AL strategies operate in an idealized framework. They assume a single, omniscient annotator who never gets tired and charges uniformly regardless of querydoi:10.1109/access.2021.3135514 fatcat:o224fxssvrcetoashuxu3oc4ry
more »... ficulty. However, in real-world applications, we often face human annotators, e.g., crowd or in-house workers, who make annotation mistakes and can be reluctant to respond if tired or faced with complex queries. Recently, a wide range of novel AL strategies has been proposed to address these issues. They differ in at least one of the following three central aspects from traditional AL: (1) They explicitly consider (multiple) human annotators whose performances can be affected by various factors, such as missing expertise. (2) They generalize the interaction with human annotators by considering different query and annotation types, such as asking an annotator for feedback on an inferred classification rule. (3) They take more complex cost schemes regarding annotations and misclassifications into account. This survey provides an overview of these AL strategies and refers to them as real-world AL. Therefore, we introduce a general real-world AL strategy as part of a learning cycle and use its elements, e.g., the query and annotator selection algorithm, to categorize about 60 real-world AL strategies. Finally, we outline possible directions for future research in the field of AL.
Daniel Kottke, Adrian Calma, Denis Huseljic, Georg Krempl, Bernhard Sick We do not recommend the abbrev. AUC because it can be mixed up with AUROC ...dblp:conf/pkdd/KottkeCHKS17 fatcat:gb33hjl625gjdg6wa6rivfwhwy
DENIS HUSELJIC received the B.Sc. and M.Sc. degrees in computer science from the University of Kassel, Germany, where he is currently pursuing the Ph.D. degree in computer science. ...doi:10.17170/kobra-202205036117 fatcat:2acodly4orbvtdrq54p3ehhwte
2020 25th International Conference on Pattern Recognition (ICPR)
Gaussian Experts in Local Approximation DAY 2 -Jan 13, 2021 Dervakos, Edmund; Filandrianos, Giorgos; Stamou, Giorgos 2531 Heuristics for Evaluation of AI Generated Music DAY 2 -Jan 13, 2021 Huseljic ... Iterative Spatiotemporal Fine-Tuning DAY 4 -Jan 15, 2021 Au-Yeung, Lee; Xie, Xianghua; Scale, Timothy Marcus; Chess, James Anthony 2842 DAY 4 -Jan 15, 2021 Herde, Marek; Kottke, Daniel; Huseljic ...doi:10.1109/icpr48806.2021.9412725 fatcat:3vge2tpd2zf7jcv5btcixnaikm