SynthEd is an agent-based framework for generating behaviorally coherent synthetic student data for Open & Distance Learning (ODL) research. It simulates student populations whose engagement trajectories, interaction logs, and dropout decisions emerge from 11 theory-grounded modules — including Tinto's integration model, Bean & Metzner's environmental factors, Bäulke et al.'s 6-phase dropout process, Garrison's Community of Inquiry, and Moore's transactional distance theory.
Key capabilities:
- Persona-driven agents with Big Five personality, SDT motivation dynamics, and academic exhaustion
- Two-phase weekly simulation: individual behavior + emergent peer network effects
- Multi-semester support with inter-semester carry-over mechanics
- 17+ statistical validation tests ensuring theoretical fidelity
- Configurable for different institutional profiles (developing country ODL, western university, corporate training, mega university)
- SynthEd addresses three persistent challenges in educational data mining: privacy restrictions on real student data (GDPR/KVKK), class imbalance in dropout datasets, and temporal incoherence in GAN/VAE-generated synthetic data. All generated data is entirely fictional with no mapping to real individuals.
Output formats: students.csv, interactions.csv, outcomes.csv, weekly_engagement.csv, pipeline_report.json
Tech stack: Python 3.10+, NumPy, SciPy, pytest (46 tests), GitHub Actions CI/CD
SynthEd is an agent-based framework for generating behaviorally coherent synthetic student data for Open & Distance Learning (ODL) research. It simulates student populations whose engagement trajectories, interaction logs, and dropout decisions emerge from 11 theory-grounded modules — including Tinto's integration model, Bean & Metzner's environmental factors, Bäulke et al.'s 6-phase dropout process, Garrison's Community of Inquiry, and Moore's transactional distance theory.
Key capabilities:
- Persona-driven agents with Big Five personality, SDT motivation dynamics, and academic exhaustion
- Two-phase weekly simulation: individual behavior + emergent peer network effects
- Multi-semester support with inter-semester carry-over mechanics
- 17+ statistical validation tests ensuring theoretical fidelity
- Configurable for different institutional profiles (developing country ODL, western university, corporate training, mega university)
- SynthEd addresses three persistent challenges in educational data mining: privacy restrictions on real student data (GDPR/KVKK), class imbalance in dropout datasets, and temporal incoherence in GAN/VAE-generated synthetic data. All generated data is entirely fictional with no mapping to real individuals.
Output formats: students.csv, interactions.csv, outcomes.csv, weekly_engagement.csv, pipeline_report.json
Tech stack: Python 3.10+, NumPy, SciPy, pytest (46 tests), GitHub Actions CI/CD