The Unit Testlet Dilemma: PISA Sample


Creative Commons License

Ayan C., BARIŞ PEKMEZCİ F.

INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION, cilt.8, sa.3, ss.613-632, 2021 (ESCI) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 8 Sayı: 3
  • Basım Tarihi: 2021
  • Doi Numarası: 10.21449/ijate.948734
  • Dergi Adı: INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION
  • Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), ERIC (Education Resources Information Center), TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.613-632
  • Anahtar Kelimeler: PISA, Testlet items, Local Dependence, Marginal item parameters, RESPONSE THEORY MODEL, LIMITED-INFORMATION, ITEM, DEPENDENCE
  • Yozgat Bozok Üniversitesi Adresli: Evet

Özet

Testlets have advantages such as making it possible to measure higher-order thinking skills and saving time, which are accepted in the literature. For this reason, they have often been preferred in many implementations from in-class assessments to large-scale assessments. Because of increased usage of testlets, the following questions are controversial topics to be studied: "Is it enough for the items to share a common stem to be assumed as a testlet?" "Which estimation method should be preferred in implementation containing this type of items?" "Is there an alternative estimation method for PISA implementation which consists of this type of items?" In addition to these, which statistical model to use for the estimations of the items, since they violate the local independence assumption has become a popular topic of discussion. In light of these discussions this study aimed to clarify the unit-testlet ambiguity with various item response theory models when testlets consist of a mixed item type (dichotomous and polytomous) for the science and math tests of the PISA 2018. When the findings were examined, it was seen that while the bifactor model fits the data best, the uni-dimensional model fits quite closely with the bifactor model for both data sets (science and math). On the other hand, the multi-dimensional IRT model has the weakest model fit for both test types. In line with all these findings, the methods used when determining the testlet items were discussed and estimation suggestions were made for implementations using testlets, especially PISA.