LANGUAGE TESTING, vol.37, no.3, pp.311-332, 2020 (AHCI)
This study examines the decision-making behaviors of raters with varying levels of experience while assessing EFL essays of distinct qualities. The data were collected from 28 raters with varying levels of rating experience and working at the English language departments of different universities in Turkey. Using a 10-point analytic rubric, each rater voice-recorded their thoughts through think-aloud protocols (TAPs) while scoring 16 essays of distinct text qualities and provided brief score explanations. Data collected from TAPs were analyzed by using a coding scheme adapted from Cumming, Kantor, and Powers (2002). The results revealed that text quality has a larger effect than rating experience on raters' decision-making behaviors. In addition, raters prioritized aspects of style, grammar, and mechanics when rating low-quality essays, but emphasized rhetoric and their general impressions of the text for high-quality essays. Furthermore, low-experienced raters differed more in their behaviors while assessing scripts of distinct qualities than did the medium- and high-experienced groups. The findings suggest that raters' scoring behaviors might evolve with practice, resulting in less variation in their decisions. As such, this research provides implications for developing strategy-based rater training programs, which might help to increase consistency across raters of different experience levels.