An Exploration of Comparative Judgement for Evaluating Writing Performances of the Austrian Year 8 Test for English as a Foreign Language

Linguistics and English Language

Electronic data

2023SickingerPhD
Final published version, 5.31 MB, PDF document
Embargo ends: 28/11/25
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

https://doi.org/10.17635/lancaster/thesis/2187
Final published version

View graph of relations

Research output: Thesis › Doctoral Thesis

Unpublished

Beci Sickinger

More...

Publication date	2025
Number of pages	370
Qualification	PhD
Awarding Institution	Lancaster University
Supervisors/Advisors	Brunfaut, Tineke, Supervisor Pill, John, Supervisor
Award date	25/09/2023
Publisher	Lancaster University
<mark>Original language</mark>	English

Abstract

Comparative judgement (CJ) is an evaluation method whereby a rank order is constructed from judges’ pairwise comparisons of performances. CJ has been shown to be reliable and practical in various contexts, but it is currently under-researched and under-utilised in second language (L2) language testing and as a method to evaluate performance dimensions/criteria independently. The present thesis investigated the use of CJ for the evaluation of lower-secondary school English as a Foreign Language (EFL) written performances from a national test in Austria.
The study used a mixed-methods research design and consisted of two strands. In Strand 1, 27 participants (Austrian EFL educators) evaluated 300 EFL scripts using CJ: once holistically (judging all aspects of one script against all aspects of a second script) and once by each of a set of dimensions/criteria (judging the features of one performance dimension/criterion for one script against the equivalent features in a second script). Additionally, the participants rated the scripts using an analytic rating scale (the conventional rating method).
CJ was found to be a reliable method of evaluating EFL scripts in all judgement sessions (scale separation reliability, the CJ measure taken over from Rasch modelling, ≥ .89). Experienced teachers with experience of evaluating similar scripts were reliable judges (infit values ≤ 1.5) and more reliable when using CJ than when rating. Participants reported considering a broader range of writing features when rating than when using CJ.
In Strand 2, think-aloud protocols were collected from eight participants while they were judging scripts with CJ. Findings indicated two approaches to the CJ decision-making process for EFL lower-secondary scripts—one reflecting a more traditional rating approach and the other a quicker, reliable approach tailored to CJ.
Overall, this thesis suggests that CJ can be a reliable and time-efficient evaluation method for EFL writing when used by trained raters and/or teachers who have experience of evaluating written performances in the classroom.

Research

Electronic data

Text available via DOI:

An Exploration of Comparative Judgement for Evaluating Writing Performances of the Austrian Year 8 Test for English as a Foreign Language

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us