A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit <a rel="external noopener" href="http://www.informatica.si:80/PDF/Informatica_2009_3.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="Institute of Electrical and Electronics Engineers (IEEE)">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/vtq6nmuchnak5n57l33lid22rq" style="color: black;">IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics)</a>
Low quality or noisy data, which typically consists of erroneous values for both dependent and independent variables, has been demonstrated to have a significantly negative impact on the classification performance of most learning techniques. The impact on learner performance can be magnified when the class distribution is imbalanced or skewed. Unfortunately in real world environments, the presence of low quality imbalanced data is a common occurrence. In most scenarios, the actual quality of<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/tsmcb.2007.912701">doi:10.1109/tsmcb.2007.912701</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/xvhlaf4m3vhcdb2cicqtfrug6m">fatcat:xvhlaf4m3vhcdb2cicqtfrug6m</a> </span>
more »... ch datasets is unknown to the data mining practitioner. In this study, we identify learners (from a total of 11 classification algorithms) with robust performance in the presence of low quality imbalanced measurement data. Noise was injected into seven imbalanced software measurement datasets, initially relatively free of noise. Learners were evaluated using analysis of variance models based on their performance as the level of injected noise, the number of attributes with noise, and the percentage of minority instances containing noise were increased. Four performance metrics suitable for class imbalanced data were used to measure learner performance. Based on our results, we recommend using the random forest ensemble learning technique for building classification models from software measurement data, regardless of the quality and class distribution of the data. Povzetek: Predstavljena je metoda za identificiranje robustnih klasifikatorjev pri šumnih podatkih.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20100806123438/http://www.informatica.si:80/PDF/Informatica_2009_3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/57/c5/57c56bcd82d34cf968af8b3e49e2d4cca35a9462.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/tsmcb.2007.912701"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>