Determination of milk yield in water buffaloes using multi-class logistic regression and machine learning methods


Creative Commons License

Boğa D. Ç., Boğa M., Ermetin O.

Tropical Animal Health and Production, cilt.57, sa.7, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 57 Sayı: 7
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1007/s11250-025-04579-1
  • Dergi Adı: Tropical Animal Health and Production
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, Environment Index
  • Anahtar Kelimeler: Cross-validation, Gradient boosting machines, Milk yield, Random forest, Support vector machines, Water buffalo
  • Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
  • Yozgat Bozok Üniversitesi Adresli: Evet

Özet

In this study, Random Forest, Gradient Boosting Machines (GBM), and Support Vector Machines (SVM), Multi-Class Logistic Regression (MCLR) models were comparatively evaluated for the prediction of milk yield in water buffaloes. The study’s main purpose was to compare the success of the determined models in milk yield predictions with their accuracy rates. In response to reviewer feedback, the methodology was enhanced to include stratified 8-fold cross-validation, hyperparameter tuning for RF, GBM, and SVM, and the removal of the multicollinear AGE variable. The revised dataset comprised the following features: lactation period (LP), lactation milk yield (LDMY), age at first pregnancy (1stPregAge), and feed type. Model training and evaluation were conducted using Python 3.7 with Pandas, NumPy, Scikit-learn, and Matplotlib libraries. According to the updated findings, the GBM model outperformed others, achieving an average accuracy of 64.63%, weighted precision of 0.6578, recall of 0.6463, F1-score of 0.6311, and ROC AUC of 0.6625. While the predictive performance remains moderate, these results demonstrate the potential of advanced machine.