Kombinasi Algoritma Sampling dengan Algoritma Klasifikasi untuk Meningkatkan Performa Klasifikasi Dataset Imbalance
Keywords:classification, imbalance datasets, SMOTE
A class to be imbalanced when there is a class that has more data than other classes. A comparison between minority classes and the majority class is called Imbalance Ratio (IR). The greater the difference between the minority class and the majority class the value of the Imbalance Ratio (IR) is getting larger. Dataset imbalance in data mining is a serious problem. The application of the classification algorithm regardless of class balance resulted in a good prediction for the majority class and a neglected minority class. Therefore, in this research, the SMOTE algorithm was applied to balance the dataset. The study used 4 datasets with different Imbalance Ratio and used classification algorithms, C45, Naïve Bayes, K-NN, and SVM. Then compared before and after using SMOTE. The research results that have been done accuracy value and value G-mean Naïve Bayes algorithm is consistent with its performance at each level of imbalance ratio, before the implementation has no good performance, whereas after the implemented SMOTE algorithm Naïve Bayes has a consistent increase in accuracy. So it can be concluded that the combination SMOTE + Naïve Bayes most effectively used in the imbalance dataset with different levels in the scheme of 10 fold cross validation and 80% data testing tested as much as 50 times.
Hairani, Noor Akhmad Setiawan, Teguh Bharata Adji, 2019. Metode Klasifikasi Data Mining dan Teknik Sampling SMOTE Menangani Class Imbalance untuk Segmentasi Customer pada Industri Perbankan. ISBN 978-602-99334-5-1.
Nurulfitrah Noorhalim, Aida Ali and Siti Mariyam Shamsuddin, 2019. Handling Imbalanced Ratio for Class Imbalance Problem Using SMOTE. In: Kor, L.-K., Ahmad, A.-R., Idrus, Z., Mansor, K.A. (Eds.), Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017). Springer Nature: Singapore.
S. Piri, D. Delen, and T. Liu, “A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets,” Decis. Support Syst., vol. 106, pp. 15–29, 2018, doi: 10.1016/j.dss.2017.11.006.
J. Wei, H. Huang, L. Yao, Y. Hu, Q. Fan, and D. Huang, “New imbalanced bearing fault diagnosis method based on Sample-characteristic Oversampling TechniquE (SCOTE) and multi-class LS-SVM,” Appl. Soft Comput., vol. 101, p. 107043, 2021, doi: 10.1016/j.asoc.2020.107043.
Asniar, N. U. Maulidevi, and K. Surendro, “SMOTE-LOF for noise identification in imbalanced data classification,” J. King Saud Univ. - Comput. Inf. Sci., no. xxxx, 2021, doi: 10.1016/j.jksuci.2021.01.014.
M. Koziarski, “Potential Anchoring for imbalanced data classification,” Pattern Recognit., vol. 120, p. 108114, 2021, doi: 10.1016/j.patcog.2021.108114.
V. P. K. Turlapati and M. R. Prusty, “Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19,” Intell. Med., vol. 3–4, no. July, p. 100023, 2020, doi: 10.1016/j.ibmed.2020.100023.
C. Wang, C. Deng, Z. Yu, D. Hui, X. Gong, and R. Luo, “Adaptive ensemble of classifiers with regularization for imbalanced data classification,” Inf. Fusion, vol. 69, no. December 2019, pp. 81–102, 2021, doi: 10.1016/j.inffus.2020.10.017.
Z. Xu, D. Shen, T. Nie, and Y. Kou, “A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data,” J. Biomed. Inform., vol. 107, no. May 2019, p. 103465, 2020, doi: 10.1016/j.jbi.2020.103465.
How to Cite
Copyright (c) 2021 Hak cipta artikel milik penulis.
This work is licensed under a Creative Commons Attribution 4.0 International License.