SynthEd: From synthetic data to simulated learners


Coşgun H. A., Genç Kumtepe E.

  • Yayınlanma Tarihi: 02 Nisan 2026
  • Versiyon: 2
  • Üniversite / Kurum: Yozgat Bozok Üniversitesi
  • Araştırma Alanı: Açık ve Uzaktan Öğrenme , Sosyal ve Beşeri Bilimler , Bilgisayar Bilimleri , Yapay Zeka, Bilgisayarda Öğrenme ve Örüntü Tanıma , Yazılım , Mühendislik ve Teknoloji

Açıklama

SynthEd is an agent-based framework for generating behaviorally coherent synthetic student data for Open & Distance Learning (ODL) research. It simulates student populations whose engagement trajectories, interaction logs, and dropout decisions emerge from 11 theory-grounded modules — including Tinto's integration model, Bean & Metzner's environmental factors, Bäulke et al.'s 6-phase dropout process, Garrison's Community of Inquiry, and Moore's transactional distance theory.

Key capabilities:

  • Persona-driven agents with Big Five personality, SDT motivation dynamics, and academic exhaustion
  • Two-phase weekly simulation: individual behavior + emergent peer network effects
  • Multi-semester support with inter-semester carry-over mechanics
  • 17+ statistical validation tests ensuring theoretical fidelity
  • Configurable for different institutional profiles (developing country ODL, western university, corporate training, mega university)
  • SynthEd addresses three persistent challenges in educational data mining: privacy restrictions on real student data (GDPR/KVKK), class imbalance in dropout datasets, and temporal incoherence in GAN/VAE-generated synthetic data. All generated data is entirely fictional with no mapping to real individuals.

Output formats: students.csv, interactions.csv, outcomes.csv, weekly_engagement.csv, pipeline_report.json

Tech stack: Python 3.10+, NumPy, SciPy, pytest (46 tests), GitHub Actions CI/CD

SynthEd is an agent-based framework for generating behaviorally coherent synthetic student data for Open & Distance Learning (ODL) research. It simulates student populations whose engagement trajectories, interaction logs, and dropout decisions emerge from 11 theory-grounded modules — including Tinto's integration model, Bean & Metzner's environmental factors, Bäulke et al.'s 6-phase dropout process, Garrison's Community of Inquiry, and Moore's transactional distance theory.

Key capabilities:

  • Persona-driven agents with Big Five personality, SDT motivation dynamics, and academic exhaustion
  • Two-phase weekly simulation: individual behavior + emergent peer network effects
  • Multi-semester support with inter-semester carry-over mechanics
  • 17+ statistical validation tests ensuring theoretical fidelity
  • Configurable for different institutional profiles (developing country ODL, western university, corporate training, mega university)
  • SynthEd addresses three persistent challenges in educational data mining: privacy restrictions on real student data (GDPR/KVKK), class imbalance in dropout datasets, and temporal incoherence in GAN/VAE-generated synthetic data. All generated data is entirely fictional with no mapping to real individuals.

Output formats: students.csv, interactions.csv, outcomes.csv, weekly_engagement.csv, pipeline_report.json

Tech stack: Python 3.10+, NumPy, SciPy, pytest (46 tests), GitHub Actions CI/CD

Yeniden Üretme Adımları

BM Sürdürülebilir Kalkınma Amaçları
  • Versiyonlar
  • Versiyon 1

    30-03-2026

  • Versiyon 2 (Yayında)

    02-04-2026

Paylaş