Home > Research > Publications & Outputs > Comparative Judgement for evaluating young lear...

Links

Text available via DOI:

View graph of relations

Comparative Judgement for evaluating young learners’ EFL writing performances: Reliability and teacher perceptions of holistic and dimension-based judgements

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Comparative Judgement for evaluating young learners’ EFL writing performances: Reliability and teacher perceptions of holistic and dimension-based judgements. / Sickinger, Rebecca; Brunfaut, Tineke; Pill, John.
In: Language Testing, Vol. 42, No. 2, 30.04.2025, p. 137-166.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

Author

Bibtex

@article{a98d95ab43884ad4b7130145f09e6003,
title = "Comparative Judgement for evaluating young learners{\textquoteright} EFL writing performances: Reliability and teacher perceptions of holistic and dimension-based judgements",
abstract = "Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges{\textquoteright} pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to be a reliable method of evaluating performances. This study extends the CJ research base to young learner EFL writing contexts and innovates CJ procedures with a novel dimension-based approach. Twenty-seven Austrian EFL educators evaluated 300 young learners{\textquoteright} EFL scripts (addressing two task types) from a national examination, using three scoring methods: standard CJ (holistic), CJ by dimensions (our new criteria-based method), and the exam{\textquoteright}s conventional analytic rating. It was found that both holistic CJ and our dimension-based CJ were reliable methods of evaluating young learners{\textquoteright} EFL scripts. Experienced EFL teachers who also have experience with using marking schemes proved to be reliable CJ judges. Moreover, despite the preference of some for the more familiar analytic rating method, teachers displayed higher reliability and shorter decision-making times when using CJ. Benefits of dimension-based CJ for reliable and economical scoring of large-scale young learner EFL writing scripts, and the potential for positive washback, are discussed.",
keywords = "Analytic rating, assessing writing, comparative judgement, testing writing, rater reliability, testing young learners, dimension-based judging, holistic judging, pairwise comparison",
author = "Rebecca Sickinger and Tineke Brunfaut and John Pill",
year = "2025",
month = apr,
day = "30",
doi = "10.1177/02655322241288847",
language = "English",
volume = "42",
pages = "137--166",
journal = "Language Testing",
issn = "0265-5322",
publisher = "SAGE Publications Ltd",
number = "2",

}

RIS

TY - JOUR

T1 - Comparative Judgement for evaluating young learners’ EFL writing performances

T2 - Reliability and teacher perceptions of holistic and dimension-based judgements

AU - Sickinger, Rebecca

AU - Brunfaut, Tineke

AU - Pill, John

PY - 2025/4/30

Y1 - 2025/4/30

N2 - Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges’ pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to be a reliable method of evaluating performances. This study extends the CJ research base to young learner EFL writing contexts and innovates CJ procedures with a novel dimension-based approach. Twenty-seven Austrian EFL educators evaluated 300 young learners’ EFL scripts (addressing two task types) from a national examination, using three scoring methods: standard CJ (holistic), CJ by dimensions (our new criteria-based method), and the exam’s conventional analytic rating. It was found that both holistic CJ and our dimension-based CJ were reliable methods of evaluating young learners’ EFL scripts. Experienced EFL teachers who also have experience with using marking schemes proved to be reliable CJ judges. Moreover, despite the preference of some for the more familiar analytic rating method, teachers displayed higher reliability and shorter decision-making times when using CJ. Benefits of dimension-based CJ for reliable and economical scoring of large-scale young learner EFL writing scripts, and the potential for positive washback, are discussed.

AB - Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges’ pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to be a reliable method of evaluating performances. This study extends the CJ research base to young learner EFL writing contexts and innovates CJ procedures with a novel dimension-based approach. Twenty-seven Austrian EFL educators evaluated 300 young learners’ EFL scripts (addressing two task types) from a national examination, using three scoring methods: standard CJ (holistic), CJ by dimensions (our new criteria-based method), and the exam’s conventional analytic rating. It was found that both holistic CJ and our dimension-based CJ were reliable methods of evaluating young learners’ EFL scripts. Experienced EFL teachers who also have experience with using marking schemes proved to be reliable CJ judges. Moreover, despite the preference of some for the more familiar analytic rating method, teachers displayed higher reliability and shorter decision-making times when using CJ. Benefits of dimension-based CJ for reliable and economical scoring of large-scale young learner EFL writing scripts, and the potential for positive washback, are discussed.

KW - Analytic rating

KW - assessing writing

KW - comparative judgement

KW - testing writing

KW - rater reliability

KW - testing young learners

KW - dimension-based judging

KW - holistic judging

KW - pairwise comparison

U2 - 10.1177/02655322241288847

DO - 10.1177/02655322241288847

M3 - Journal article

VL - 42

SP - 137

EP - 166

JO - Language Testing

JF - Language Testing

SN - 0265-5322

IS - 2

ER -