A Neural Network Based Software Defect Prediction Approach Using SMOTE and Noise Filtering-CLNI

Authors

  • Ahmmed Bin Ashfaque Bangladesh Army University of Science and Technology, Fultola Khulna-9204, Bangladesh
  • Abdus Sattar Military Institute of Science and Technology
  • Hosney Jahan CSE Dept., EWU, Dhaka-1212
  • M. Akhtaruzzaman Department of CSE, Daffodil International University, Savar, Dhaka-1216, Bangladesh https://orcid.org/0000-0002-9929-4066 (unauthenticated)
  • Fernaz Narin Nur Department of CSE, Daffodil International University, Savar, Dhaka-1216, Bangladesh

DOI:

https://doi.org/10.47981/j.mijst.13(02)2025.557(111-121)

Keywords:

Software Defect Prediction, SMOTE, CLNI, Dense Neural Network, Data Balancing, Feature Selection

Abstract

Software defects can cause significant loss and system failures in software development life cycle. Software Defect Prediction (SDP) is a vital step for ensuring the quality of software. Till now, a number of machine learning models have been proposed to predict potential defects and make the software more reliable. However, SDP models suffer from the problem of imbalanced dataset, resulting in poor prediction accuracy. To mitigate this issue several data balancing techniques, i.e., over sampling, under sampling etc. have been proposed to balance the dataset. In some cases the data balancing methods may further introduce noisy and mislabeled samples in the dataset. To deal with these issues, in this paper, we propose a neural network based approach that combines the oversampling technique Synthetic Minority Oversampling Technique (SMOTE) with the noise filtering technique Class Level Noise Identification (CLNI).  Here, we applied three different CLNI methods which are Edited Nearest Neighbor (ENN), Repeated ENN (RENN) and All-KNN. Our aim is to make the dataset clean, balanced and efficient by combining SMOTE with CLNI. In addition, we applied a number of feature selection methods to identify the most important features, further contributing towards achieving better prediction accuracy. To evaluate the effectiveness of the proposed model, we conduct experiments on several benchmark datasets (MC1, PC1, PC2, PC3 and PC4) obtained from NASA MDP and (ML, LC and JDT) AEEEM repository. The experimental results have been evaluated and compared in terms of accuracy, precision, recall and AUC-ROC curve. The experimental results demonstrated that our proposed approach has achieved up to 98% accuracy and outperformed state-of-the-art approaches.

Downloads

Download data is not yet available.

References

Akintola, A. G., Balogun, A. O., Lafenwa-Balogun, F., & Mojeed, H. A. (2018). Comparative analysis of selected heterogeneous classifiers for software defects prediction using filter-based feature selection methods. FUOYE Journal of Engineering and Technology, 3, 134–137.

Ali, M., Mazhar, T., Al-Rasheed, A., Shahzad, T., Yazeed Yasin Ghadi, & Muhammad Amir Khan. (2024). Enhancing software defect prediction: A framework with improved feature selection and ensemble machine learning. PeerJ Computer Science, 10, e1860–e1860. https://doi.org/10.7717/peerj-cs.1860

Ali, M., Mazhar, T., Arif, Y., Shaha Al-Otaibi, Yazeed Yasin Ghadi, Shahzad, T., Muhammad Amir Khan, & Habib Hamam. (2024). Software defect prediction using an intelligent ensemble-based model. IEEE Access, 1–1. https://doi.org/10.1109/access.2024.3358201

Alkhawaldeh, I. M., Albalkhi, I., & Naswhan, A. J. (2023). Challenges and limitations of synthetic minority oversampling techniques in machine learning. World Journal of Methodology, 13(5), 373–378. https://doi.org/10.5662/wjm.v13.i5.373

Cetiner, M., & Sahingoz, O. K. (2020, July 1). A comparative analysis for machine learning based software defect prediction systems. IEEE Xplore. https://doi.org/10.1109/ICCCNT49239.2020.9225352

Elreedy, D., & Atiya, A. F. (2019). A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32–64.

Feng, S., Keung, J., Yu, X., Xiao, Y., Bennin, K. E., Kabir, M. A., & Zhang, M. (2021). COSTE: Complexity-based oversampling technique to alleviate the class imbalance problem in software defect prediction. Information and Software Technology, 129, 106432. https://doi.org/10.1016/j.infsof.2020.106432

Gupta, M., Rajnish, K., & Bhattacharjee, V. (2023). Software fault prediction with imbalanced datasets using SMOTE-Tomek sampling technique and genetic algorithm models. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-023-16788-7

J, A. A., & Judith, J. E. (2023). Enhanced deep learning approach for software defect forecasting. 1–7. https://doi.org/10.1109/aicera/icis59538.2023.10419998

Khleel, N. A. A., & Nehéz, K. (2024). Software defect prediction using a bidirectional LSTM network combined with oversampling techniques. Cluster Computing, 27(3), 3615–3638. https://doi.org/10.1007/s10586-023-04955-9

Mafarja, M., Thaher, T., Al-Betar, M. A., Too, J., Awadallah, M. A., Abu Doush, I., & Turabieh, H. (2023). Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning. Applied Intelligence. Advance online publication. https://doi.org/10.1007/s10489-022-04427-x

McHugh, M. L. (2008). The Chi-square test: An introduction. Biochemia Medica, 18(2), 112–118. https://www.researchgate.net/publication/5856449_The_Chi-square_test_an_introduction

Rathore, S. S., Chouhan, S. S., Jain, D. K., & Vachhani, A. G. (2022). Generative oversampling methods for handling imbalanced data in software fault prediction. IEEE Transactions on Reliability, 71(2), 747–762. https://doi.org/10.1109/TR.2022.3158949

Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation coefficients: Appropriate use and interpretation. Anesthesia & Analgesia, 126(5), 1763–1768. https://doi.org/10.1213/ANE.0000000000002864

Venkatesh, B., & Anuradha, J. (2019). A review of feature selection and its methods. Cybernetics and Information Technologies, 19(1), 3–26. https://doi.org/10.2478/cait-2019-0001

Vuttipittayamongkol, P., & Elyan, E. (2020). Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Information Sciences, 509, 47–70. https://doi.org/10.1016/j.ins.2019.08.062

Zhao, L., Shang, Z., Zhao, L., Qin, A., & Tang, Y. Y. (2018). Siamese dense neural network for software defect prediction with small data. IEEE Access, 7, 7663–7677. https://doi.org/10.1109/access.2018.2889061

Downloads

Published

2025-12-31

Issue

Section

ARTICLES

How to Cite

A Neural Network Based Software Defect Prediction Approach Using SMOTE and Noise Filtering-CLNI. (2025). MIST INTERNATIONAL JOURNAL OF SCIENCE AND TECHNOLOGY, 13(2), 111-121. https://doi.org/10.47981/j.mijst.13(02)2025.557(111-121)

Similar Articles

11-20 of 71

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)