72 Hits in 6.1 sec

Problems with SZZ and Features: An empirical study of the state of practice of defect prediction data collection [article]

Steffen Herbold, Alexander Trautsch, Fabian Trautsch, Benjamin Ledel
2021 arXiv   pre-print
Objective: We provide an empirical analysis of the defect labels created with the SZZ algorithm and the impact of commonly used features on results.  ...  Method: We used a combination of manual validation and adopted or improved heuristics for the collection of defect data. We conducted an empirical study on 398 releases of 38 Apache projects.  ...  We also want to thank the GWDG for the support in using their high performance computing infrastructure, that enabled the collection of the large amounts of software metric data.  ... 
arXiv:1911.08938v3 fatcat:xrj2fi7o6jbdfbym2jdflgeex4

A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes

Daniel Alencar da Costa, Shane McIntosh, Weiyi Shang, Uira Kulesza, Roberta Coelho, Ahmed E. Hassan
2017 IEEE Transactions on Software Engineering  
Despite the foundational role of SZZ, little effort has been made to evaluate its results. Such an evaluation is a challenging task because the ground truth is not readily available.  ...  We use the proposed framework to evaluate five SZZ implementations using data from ten open source projects.  ...  Such an assessment of SZZ may also guide future manual analyses that are to be performed upon SZZ-generated data.  ... 
doi:10.1109/tse.2016.2616306 fatcat:swlpoeuivncxvkc6mxac4twr64

Better Data Labelling with EMBLEM (and how that Impacts Defect Prediction) [article]

Huy Tu and Zhe Yu and Tim Menzies
2020 arXiv   pre-print
Also, in studies with 9 open source software projects, labelling via EMBLEM's incremental application of human+AI is at least an order of magnitude cheaper than existing methods (≈ eight times).  ...  For the data sets explored here, EMBLEM better labelling methods significantly improved P_opt20 and G-scores performance in nearly all the projects studied here.  ...  ACKNOWLEDGEMENTS This work was partially funded by an NSF CISE Grant #1826574 and #1931425.  ... 
arXiv:1905.01719v3 fatcat:uf5gnpl7rfhrjk3heqdjoetfs4

A systematic data collection procedure for software defect prediction

Goran Mausa, Tihana Galinac-Grbac, Bojana Dalbelo-Basic
2016 Computer Science and Information Systems  
Software defect prediction research relies on data that must be collected from otherwise separate repositories.  ...  This paper presents an exhaustive survey of techniques and approaches used in the data collection process.  ...  Then we use it for an empirical study to compare our own data collection procedure with existing practices and to study the effectiveness of linking techniques in different contexts.  ... 
doi:10.2298/csis141228061m fatcat:wq3sx4fiy5ek7fkdr43zilsoju

Within-Project Defect Prediction of Infrastructure-as-Code Using Product and Process Metrics

Stefano Dallapalma, Dario Di Nucci, Fabio Palomba, Damian Andrew Tamburri
2021 IEEE Transactions on Software Engineering  
In this paper, we aim at assessing the role of product and process metrics when predicting defective IaC scripts.  ...  The key results of the study report RANDOM FOREST as the best-performing model, with a median AUC-PR of 0.93 and MCC of 0.80.  ...  Fabio is partially supported by the Swiss National Science Foundation through the SNF Project No. PZ00P2 186090 (TED).  ... 
doi:10.1109/tse.2021.3051492 fatcat:6jhgjrniwfdb3meudqkw3i5nqi

On the Need of Removing Last Releases of Data When Using or Validating Defect Prediction Models [article]

Aalok Ahluwalia, Massimiliano Di Penta, Davide Falessi
2021 arXiv   pre-print
To develop and train defect prediction models, researchers rely on datasets in which a defect is attributed to an artifact, e.g., a class of a given release.  ...  We analyze the accuracy of 15 machine learning defect prediction classifiers on data from more than 4,000 bugs and 600 releases of 19 open source projects from the Apache ecosystem.  ...  [63] proposed a practical defect prediction approach for companies that do not track defect-related data.  ... 
arXiv:2003.14376v2 fatcat:e3jtxuviy5hujkm52oufdgt5am

An Industrial Case Study on Shrinking Code Review Changesets through Remark Prediction [article]

Tobias Baum and Steffen Herbold and Kurt Schneider
2018 arXiv   pre-print
Besides the main results on the mining and prediction of triggers for review remarks, we contribute experiences with a novel, multi-objective and interactive rule mining approach.  ...  To determine the importance of change parts, we extract data from software repositories and build prediction models for review remarks based on this data. The approach is discussed in detail.  ...  Furthermore, we would like to thank Eirini Ntoutsi for input on the data mining approach, Javad Ghofrani for providing his mining server and Melanie Busch and Wasja Brunotte for feedback on an article  ... 
arXiv:1812.09510v1 fatcat:biwaz4fokzhdfdaj4aoukofxli

How bugs are born: a model to identify how bugs are introduced in software components

Gema Rodríguez-Pérez, Gregorio Robles, Alexander Serebrenik, Andy Zaidman, Daniel M. Germán, Jesus M. Gonzalez-Barahona
2020 Empirical Software Engineering  
The lack of empirical evidence makes it impossible to assess how important these cases are and therefore, to which extent the assumption is valid.  ...  The manual analysis helped classify the root cause of those bugs and created manually curated datasets with bug-introducing changes and with bugs that were not introduced by any change in the source code  ...  We also acknowledge the support of several authors by the Government of Spain through projects TIN2014-59400-R and "BugBirth" RTI2018-101963-B-I00.  ... 
doi:10.1007/s10664-019-09781-y fatcat:rf5dkrkidbekndokem5tut2wnu

Autorank: A Python package for automated ranking of classifiers

Steffen Herbold
2020 Journal of Open Source Software  
The distribution of the populations must be analyzed with the Shapiro-Wilk test for normality and, depending on the normality with Levene's test or Bartlett's tests for homogeneity of the data.  ...  Good reporting of the results goes beyond simply stating the significance of findings.  ...  Issues with szz: An empirical assessment of the state of practice of defect prediction data collection. Retrieved from http://arxiv. org/abs/1911.08938  ... 
doi:10.21105/joss.02173 fatcat:42kr3xizfjfgbmo7f2thqwdooy

Predictive Models in Software Engineering: Challenges and Opportunities [article]

Yanming Yang, Xin Xia, David Lo, Tingting Bi, John Grundy, Xiaohu Yang
2020 arXiv   pre-print
Predictive models are one of the most important techniques that are widely applied in many areas of software engineering.  ...  We describe the key models and approaches used, classify the different models, summarize the range of key application areas, and analyze research results.  ...  [54] evaluated the impact of data noise on defect prediction with adoption of basic learners and bagging learners.  ... 
arXiv:2008.03656v1 fatcat:fe7ylphujfbobeo3g5yevniiei

Investigating code review quality: Do people and participation matter?

Oleksii Kononenko, Olga Baysal, Latifa Guerrouj, Yaxin Cao, Michael W. Godfrey
2015 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME)  
We applied the SZZ algorithm to detect bug-inducing changes that were then linked to the code review information extracted from the issue tracking system.  ...  In practice, bugs are sometimes unwittingly introduced during this process. In this paper, we report on an empirical study investigating code review quality for Mozilla, a large open-source project.  ...  We apply the SZZ algorithm [7] to detect bug-inducing changes that are then linked to the code review data extracted from the issue tracking system.  ... 
doi:10.1109/icsm.2015.7332457 dblp:conf/icsm/KononenkoBGCG15 fatcat:asb3gdonqjfd7ndltbkbgsfxny

The Relation of Test-Related Factors to Software Quality: A Case Study on Apache Systems

Fabiano Pecorelli, Fabio Palomba, Andrea De Lucia
2021 Empirical Software Engineering  
The key findings of the study show that, when controlling for other metrics (e.g., size of the production class), test-related factors have a limited connection to post-release defects.  ...  We first investigated how the presence of tests relates to post-release defects; then, we analyzed the role played by the test-related factors previously shown as significantly related to post-release  ...  Palomba gratefully acknowledges the support of the Swiss National Science Foundation through the SNF Projects No.  ... 
doi:10.1007/s10664-020-09891-y fatcat:psfducgepfatjovzinfjchifpa

Fault Prediction based on Software Metrics and SonarQube Rules. Machine or Deep Learning? [article]

Francesco Lomio, Sergio Moreschini, Valentina Lenarduzzi
2021 arXiv   pre-print
As a result, fourteen of the 174 violated rules has an importance higher than 1\% and account for 30\% of the total fault-proneness importance, while the fault proneness of the remaining 165 rules is negligible  ...  We designed and conducted a case study among 33 Java projects analyzed with SonarQube and SZZ to identify fault-inducing and fault-fixing commits.  ...  , and usage of an issue tracking system with at least 100 issues reported.  ... 
arXiv:2103.11321v1 fatcat:xjnxngqolbfftisu7zzgsfx63q

Predicting risk of pre-release code changes with Checkinmentor

Alexander Tarvo, Nachiappan Nagappan, Thomas Zimmermann
2013 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE)  
The predictions are done not at a file or binary level but at the change level thereby assessing the impact of each change.  ...  Code defects introduced during the development of the software system can result in failures after its release.  ...  on the issue, and other useful data.  ... 
doi:10.1109/issre.2013.6698912 dblp:conf/issre/TarvoNZ13 fatcat:gf4ew3tvkzgk7mlqv6266h72tq

A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction

Seyedrebvar Hosseini, Burak Turhan, Dimuthu Gunarathna
2017 IEEE Transactions on Software Engineering  
Objective: To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances.  ...  Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence.  ...  improvements over the initial version of the manuscript.  ... 
doi:10.1109/tse.2017.2770124 fatcat:puodxjkkdjglpdxynaktjiycpu
« Previous Showing results 1 — 15 out of 72 results