Categorization of Microscopic Wood Images with Transfer Learning Approach on Pretrained Vision Transformer Models

Kılıç, KENAN

doi:10.15376/biores.20.3.6394-6405

Categorization of Microscopic Wood Images with Transfer Learning Approach on Pretrained Vision Transformer Models

Kılıç K.

BioResources, cilt.20, sa.3, ss.6394-6405, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 20 Sayı: 3
Basım Tarihi: 2025
Doi Numarası: 10.15376/biores.20.3.6394-6405
Dergi Adı: BioResources
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Agricultural & Environmental Science Database, CAB Abstracts, Chemical Abstracts Core, Compendex, Veterinary Science Database, Directory of Open Access Journals
Sayfa Sayıları: ss.6394-6405
Anahtar Kelimeler: Computer vision, Deep learning, Vision transformer, Wood classification, Wood products industrial engineering
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Yozgat Bozok Üniversitesi Adresli: Evet

Özet

Four Vision Transformer (ViT)-based models were optimized to classify microscopic wood images. The models were DeiT, Google ViT, BeiT, and Microsoft Swin Transformer. Training was performed on a set enriched with data augmentation techniques. The generalization ability of the model was strengthened by increasing the number of images for each class with data augmentation. The dataset used in the study consisted of 112 different species belonging to 30 families, 37 of which were coniferous and 75 were angiosperms. The samples had been softened, cut into thin sections, colored with the triple staining method, and imaged with fixed magnification. The Google ViT model was the most successful, with 99.40% accuracy. The DeiT model, which stood out with its data efficiency, ranked second with 98.51% accuracy, while the BEiT and Microsoft Swin Transformer models reached 96.43% and 98.21% accuracy, respectively. The Microsoft Swin Transformer model required the least training time. Data augmentation techniques improved the performance of all models by 3% to 5%, thus increasing the resistance of the models to overfitting and providing more robust predictions. It was found that ViT-based models gave superior performance in microscopic wood image classification tasks and that data augmentation significantly improved model performance.