THE IMPACT OF SMOTE-BASED RESAMPLING ON KNN FOR AUTOMATED SLEEP STAGING: A COMPARISON WITH MLP


Köksal A. S., Bakir M., Süzgen E. E.

5. Uluslararası Gaziantep Bilimsel Araştırmalar Kongresi, Gaziantep, Türkiye, 27 - 28 Aralık 2025, ss.382-389, (Özet Bildiri)

  • Yayın Türü: Bildiri / Özet Bildiri
  • Basıldığı Şehir: Gaziantep
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.382-389
  • Yozgat Bozok Üniversitesi Adresli: Evet

Özet

For the N1 stage, automatic sleep staging is still difficult, which results in subpar classifier performance. Using data augmentation and resampling techniques like SMOTE and cleaning- based undersampling, a multilayer perceptron (MLP) model was recently demonstrated to enhance N1 recall on the SleepEDF dataset. Nevertheless, no research has yet been done on the effectiveness of distance-based classifiers in this field. In this work, we incorporate data augmentation techniques into a pipeline for subject-based evaluation. Next, by directly comparing the outcomes with the MLP model's performance in SleepEDF, we expand on our earlier methodological analysis of K-Nearest Neighbors (KNN). We test KNN with SMOTE oversampling variations and cleaning setups like Tomek/ENN/AllKNN. We use the same hand- made feature set, the same subject-based cross-validation design, and data augmentation only in the training set to keep data from leaking. We also use macro-F1 and class-wise recall to measure how the model behaves and to see how performance varies between minority and majority sleep stages. The results show that data augmentation has an effect that can be measured, but it depends on the model. KNN acts differently than MLP, with improvements or declines that vary between classes N1, N2, N3, REM, and WAKE. The research demonstrates that SMOTE-based resampling interacts distinctively with parametric and distance-based classifiers, with KNN displaying heightened sensitivity to neighborhood alterations induced by synthetic samples. This work provides the first systematic assessment of SMOTE-based augmentation for distance-based sleep staging models, offering practical insights for future classifier design.