Sınıf Dengesizliği Sorununu Çözmek İçin Kullanılan Algoritmaların Farklı Sınıflandırma Yöntemlerinde Performanslarının Karşılaştırılması
AYDIN HAKLI, Duygu
xmlui.mirage2.itemSummaryView.MetaDataShow full item record
Class imbalance, for a given dataset, occurs when there are relatively small observations in one or more groups comparing to other groups. Analyzing imbalanced data sets via machine learning algorithms has become a common and remarkable research area in recent years. However, this problem leads to a decrease in the model performance. Besides that, selection of the model for classiﬁcation, optimizing model parameters, validating the ﬁtted model, underlying distribution and data structure may also aﬀect model performance. Furthermore, several data balancing algorithms were proposed to overcome class imbalance problem such as SMOTE, SMOTEBoost, RUSBoost, MWMOTE, EasyEnsemble, SMOTEBagging and UnderBagging. In this study, we evaluated model performances using a comprehensive simulation study along with real data examples. We conducted a simulation study under diﬀerent classiﬁcation models, class imbalance algorithms, sample sizes, correlation structures and class imbalance ratios. Each scenario was repeated 1000 times and the ﬁtted models were optimized using 5-folds cross-validation. Simulation study showed that the model performances increase with sample size and correlation among dependent and independent variables. When the correlation approaches zero and classes are highly imbalanced, RUSBoost outperforms other algorithms. As data become more balanced, the seven algorithms gave similar results independently from sample size and correlation structure. Overall simulation results, RUSBoost algorithm provided better result for all sample sizes and EasyEnsemble for small sample size the most of the simulation combinations.