Categorization of Microscopic Wood Images with Transfer Learning Approach on Pretrained Vision Transformer Models


Creative Commons License

Kılıç K.

BioResources, vol.20, no.3, pp.6394-6405, 2025 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Volume: 20 Issue: 3
  • Publication Date: 2025
  • Doi Number: 10.15376/biores.20.3.6394-6405
  • Journal Name: BioResources
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Agricultural & Environmental Science Database, CAB Abstracts, Chemical Abstracts Core, Compendex, Veterinary Science Database, Directory of Open Access Journals
  • Page Numbers: pp.6394-6405
  • Keywords: Computer vision, Deep learning, Vision transformer, Wood classification, Wood products industrial engineering
  • Open Archive Collection: AVESIS Open Access Collection
  • Yozgat Bozok University Affiliated: Yes

Abstract

Four Vision Transformer (ViT)-based models were optimized to classify microscopic wood images. The models were DeiT, Google ViT, BeiT, and Microsoft Swin Transformer. Training was performed on a set enriched with data augmentation techniques. The generalization ability of the model was strengthened by increasing the number of images for each class with data augmentation. The dataset used in the study consisted of 112 different species belonging to 30 families, 37 of which were coniferous and 75 were angiosperms. The samples had been softened, cut into thin sections, colored with the triple staining method, and imaged with fixed magnification. The Google ViT model was the most successful, with 99.40% accuracy. The DeiT model, which stood out with its data efficiency, ranked second with 98.51% accuracy, while the BEiT and Microsoft Swin Transformer models reached 96.43% and 98.21% accuracy, respectively. The Microsoft Swin Transformer model required the least training time. Data augmentation techniques improved the performance of all models by 3% to 5%, thus increasing the resistance of the models to overfitting and providing more robust predictions. It was found that ViT-based models gave superior performance in microscopic wood image classification tasks and that data augmentation significantly improved model performance.