Exploiting Class Learnability in Noisy Data [article]

Matthew Klawonn, Eric Heim, James Hendler
<span title="2018-11-15">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets harvested via these means, sometimes resulting in entire classes of data on which learned classifiers generalize poorly. For real world applications, we argue that it can be beneficial to avoid training on such classes entirely. In this work, we aim to explore the
more &raquo; ... asses in a given data set, and guide supervised training to spend time on a class proportional to its learnability. By focusing the training process, we aim to improve model generalization on classes with a strong signal. To that end, we develop an online algorithm that works in conjunction with classifier and training algorithm, iteratively selecting training data for the classifier based on how well it appears to generalize on each class. Testing our approach on a variety of data sets, we show our algorithm learns to focus on classes for which the model has low generalization error relative to strong baselines, yielding a classifier with good performance on learnable classes.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.06524v1">arXiv:1811.06524v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/bkxo5dxjrvd2jle23wxuri7tkq">fatcat:bkxo5dxjrvd2jle23wxuri7tkq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191022031720/https://arxiv.org/pdf/1811.06524v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b1/61/b16112c45164daa1cee968c866ce5835b138f731.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.06524v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>