Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Comparative Judgement for evaluating young learners’ EFL writing performances
T2 - Reliability and teacher perceptions of holistic and dimension-based judgements
AU - Sickinger, Rebecca
AU - Brunfaut, Tineke
AU - Pill, John
PY - 2025/4/30
Y1 - 2025/4/30
N2 - Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges’ pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to be a reliable method of evaluating performances. This study extends the CJ research base to young learner EFL writing contexts and innovates CJ procedures with a novel dimension-based approach. Twenty-seven Austrian EFL educators evaluated 300 young learners’ EFL scripts (addressing two task types) from a national examination, using three scoring methods: standard CJ (holistic), CJ by dimensions (our new criteria-based method), and the exam’s conventional analytic rating. It was found that both holistic CJ and our dimension-based CJ were reliable methods of evaluating young learners’ EFL scripts. Experienced EFL teachers who also have experience with using marking schemes proved to be reliable CJ judges. Moreover, despite the preference of some for the more familiar analytic rating method, teachers displayed higher reliability and shorter decision-making times when using CJ. Benefits of dimension-based CJ for reliable and economical scoring of large-scale young learner EFL writing scripts, and the potential for positive washback, are discussed.
AB - Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges’ pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to be a reliable method of evaluating performances. This study extends the CJ research base to young learner EFL writing contexts and innovates CJ procedures with a novel dimension-based approach. Twenty-seven Austrian EFL educators evaluated 300 young learners’ EFL scripts (addressing two task types) from a national examination, using three scoring methods: standard CJ (holistic), CJ by dimensions (our new criteria-based method), and the exam’s conventional analytic rating. It was found that both holistic CJ and our dimension-based CJ were reliable methods of evaluating young learners’ EFL scripts. Experienced EFL teachers who also have experience with using marking schemes proved to be reliable CJ judges. Moreover, despite the preference of some for the more familiar analytic rating method, teachers displayed higher reliability and shorter decision-making times when using CJ. Benefits of dimension-based CJ for reliable and economical scoring of large-scale young learner EFL writing scripts, and the potential for positive washback, are discussed.
KW - Analytic rating
KW - assessing writing
KW - comparative judgement
KW - testing writing
KW - rater reliability
KW - testing young learners
KW - dimension-based judging
KW - holistic judging
KW - pairwise comparison
U2 - 10.1177/02655322241288847
DO - 10.1177/02655322241288847
M3 - Journal article
VL - 42
SP - 137
EP - 166
JO - Language Testing
JF - Language Testing
SN - 0265-5322
IS - 2
ER -