Home > Research > Publications & Outputs > An Exploration of Comparative Judgement for Eva...

Electronic data

  • 2023SickingerPhD

    Final published version, 5.31 MB, PDF document

    Embargo ends: 28/11/25

    Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

View graph of relations

An Exploration of Comparative Judgement for Evaluating Writing Performances of the Austrian Year 8 Test for English as a Foreign Language

Research output: ThesisDoctoral Thesis

  • Beci Sickinger
Publication date2025
Number of pages370
Awarding Institution
Award date25/09/2023
  • Lancaster University
<mark>Original language</mark>English


Comparative judgement (CJ) is an evaluation method whereby a rank order is constructed from judges’ pairwise comparisons of performances. CJ has been shown to be reliable and practical in various contexts, but it is currently under-researched and under-utilised in second language (L2) language testing and as a method to evaluate performance dimensions/criteria independently. The present thesis investigated the use of CJ for the evaluation of lower-secondary school English as a Foreign Language (EFL) written performances from a national test in Austria.
The study used a mixed-methods research design and consisted of two strands. In Strand 1, 27 participants (Austrian EFL educators) evaluated 300 EFL scripts using CJ: once holistically (judging all aspects of one script against all aspects of a second script) and once by each of a set of dimensions/criteria (judging the features of one performance dimension/criterion for one script against the equivalent features in a second script). Additionally, the participants rated the scripts using an analytic rating scale (the conventional rating method).
CJ was found to be a reliable method of evaluating EFL scripts in all judgement sessions (scale separation reliability, the CJ measure taken over from Rasch modelling, ≥ .89). Experienced teachers with experience of evaluating similar scripts were reliable judges (infit values ≤ 1.5) and more reliable when using CJ than when rating. Participants reported considering a broader range of writing features when rating than when using CJ.
In Strand 2, think-aloud protocols were collected from eight participants while they were judging scripts with CJ. Findings indicated two approaches to the CJ decision-making process for EFL lower-secondary scripts—one reflecting a more traditional rating approach and the other a quicker, reliable approach tailored to CJ.
Overall, this thesis suggests that CJ can be a reliable and time-efficient evaluation method for EFL writing when used by trained raters and/or teachers who have experience of evaluating written performances in the classroom.