Enhancing File Security with an Optimized Auto-Classification Framework Based on Learning Models


Açıkgöz Z., Arslan S., Arslan R. S.

2025 9th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Türkiye, 14 - 16 Kasım 2025, ss.1-6, (Tam Metin Bildiri)

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ismsit67332.2025.11268095
  • Basıldığı Şehir: Ankara
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.1-6
  • Yozgat Bozok Üniversitesi Adresli: Evet

Özet

Malicious PDF files pose a significant threat to digital security. These files can compromise sensitive information and disrupt system operations. Detecting and classifying such files is therefore critical to maintaining cybersecurity. In this study, a systematic approach is introduced to effectively categorize PDF documents. From a dataset of roughly 30,000 PDF files, 43 structural and general features were identified. The dataset was analyzed using a variety of machine learning and deep learning models after being digitized using TF IDF, N-gram, and Word2Vec techniques.Results from machine learning models show high performance, with the Support Vector Machine achieving 0.9967 accuracy using TF-IDF and the Decision Tree reaching 0.9966 accuracy with the count vectorizer and N-gram. For deep learning models, CNN achieved up to 0.9967 accuracy with Count Vectorizer and N-gram, while BiLSTM and GRU also demonstrated high performance using Word2Vec. These results indicate that the proposed approach provides a reliable and effective detection of PDF-based malware.