ARTEMIS: An Explainable AI Framework for Multi-Class COVID-19 Diagnosis with a Newly Curated Dataset

Sahin, Muhammet; Ulutaş, HASAN; Erkoç, MUSTAFA; Karakaya, Baris; Günay, RECEP; Süzgen, ENES

doi:10.3390/bioengineering13050588

ARTEMIS: An Explainable AI Framework for Multi-Class COVID-19 Diagnosis with a Newly Curated Dataset

Sahin M. E., Ulutaş H., Erkoç M. F., Karakaya B., Günay R. B., Süzgen E. E.

Bioengineering, cilt.13, sa.5, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 13 Sayı: 5
Basım Tarihi: 2026
Doi Numarası: 10.3390/bioengineering13050588
Dergi Adı: Bioengineering
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, INSPEC, Directory of Open Access Journals, Academic Search Ultimate (EBSCO), Natural Science Collection (ProQuest), Biological Science Database (ProQuest), Materials Science & Engineering Collection (ProQuest), Technology Collection (ProQuest)
Anahtar Kelimeler: COVID-19, CT, deep learning, Explainable AI (Grad-CAM++), X-ray
Yozgat Bozok Üniversitesi Adresli: Evet

Özet

In this work, we propose ARTEMIS, a novel and highly interpretable deep learning pipeline for the automatic classification of Chest X-ray (CXR) and Computed Tomography (CT) images into different categories related to important clinical outcomes: COVID-19 infection, Community-Acquired Pneumonia (CAP) cases, and Normal cases. Unlike existing models based on the static feature enhancement step, ARTEMIS proposes a learnable preprocessing component that dynamically adapts the image contrast and sharpness in training mode, facilitating adaptive optimization. Our hybrid network combines EfficientNet-B0 backbone with built-in SE attention with the optional lightweight Transformer encoder block to jointly learn local radiological features and global relationships between pixels. Comprehensive experiments have been conducted on five different datasets, which comprise four publicly available ones and one novel CT dataset annotated by radiologists, including X-ray and CT modalities. Experimental results show strong robustness and generalization with macro F1-scores greater than 96% on public datasets and 99.39% accuracy on our new CT dataset. To interpret the decision-making process, Grad-CAM++ is employed to generate class-discriminative saliency maps; the highlighted regions are systematically validated against established radiological criteria by a board-certified radiologist, confirming that model decisions are grounded in clinically meaningful pulmonary findings rather than imaging artifacts.