Treffer: An ensemble model for addressing class imbalance and class overlap in software defect prediction.
Weitere Informationen
Software defect prediction (SDP) is an important action and an emerging challenge in the process of software development that is used to increase the software quality. SDP identifies those modules of the software that are expected to contain defects, thereby helping to allocate the limited testing resources cost-efficiently so that the overall development cost is reduced. Various machine learning techniques have been utilised for developing SDP models. However, a major challenge to SDP models in identifying the software defective modules is the class imbalance problem of SDP datasets. Moreover, existing literature shows that the class overlap in imbalanced SDP datasets had a much negative impact on the prediction capability of SDP models. In this paper, we propose an effective ensemble SDP model that employs a four-stage pipeline approach to addresses both the problems of class overlap and class imbalance simultaneously. Our approach integrates the framework of class overlap reduction technique and under-sampling technique with the extreme gradient boosting classifier (XGBoost). Through this integrated approach, our model effectively handles both class overlap and class imbalance issues, providing an enhanced solution for SDP tasks. We assess the effectiveness of our proposed SDP model by comparing its performance against ten state-of-the-art SDP models using sixteen imbalanced software defect datasets. The experimental results, coupled with statistical analysis, indicate that our proposed SDP model exhibits superior predictive performance, surpassing the other ten benchmark models across various metrics such as recall, G-mean, F-measure, and AUC. [ABSTRACT FROM AUTHOR]
Copyright of International Journal of Systems Assurance Engineering & Management is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)