An Exploration of Comparative Judgement for Evaluating Writing Performances of the Austrian Year 8 Test for English as a Foreign Language

Linguistics and English Language

Electronic data

2023SickingerPhD
Final published version, 5.31 MB, PDF document
Embargo ends: 28/11/25
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

https://doi.org/10.17635/lancaster/thesis/2187
Final published version

View graph of relations

Research output: Thesis › Doctoral Thesis

Unpublished

Standard

An Exploration of Comparative Judgement for Evaluating Writing Performances of the Austrian Year 8 Test for English as a Foreign Language. / Sickinger, Beci.
Lancaster University, 2025. 370 p.

Research output: Thesis › Doctoral Thesis

Bibtex

@phdthesis{08fc093f92c54c1f9d4ff4fb11e07019,

title = "An Exploration of Comparative Judgement for Evaluating Writing Performances of the Austrian Year 8 Test for English as a Foreign Language",

abstract = "Comparative judgement (CJ) is an evaluation method whereby a rank order is constructed from judges{\textquoteright} pairwise comparisons of performances. CJ has been shown to be reliable and practical in various contexts, but it is currently under-researched and under-utilised in second language (L2) language testing and as a method to evaluate performance dimensions/criteria independently. The present thesis investigated the use of CJ for the evaluation of lower-secondary school English as a Foreign Language (EFL) written performances from a national test in Austria.The study used a mixed-methods research design and consisted of two strands. In Strand 1, 27 participants (Austrian EFL educators) evaluated 300 EFL scripts using CJ: once holistically (judging all aspects of one script against all aspects of a second script) and once by each of a set of dimensions/criteria (judging the features of one performance dimension/criterion for one script against the equivalent features in a second script). Additionally, the participants rated the scripts using an analytic rating scale (the conventional rating method).CJ was found to be a reliable method of evaluating EFL scripts in all judgement sessions (scale separation reliability, the CJ measure taken over from Rasch modelling, ≥ .89). Experienced teachers with experience of evaluating similar scripts were reliable judges (infit values ≤ 1.5) and more reliable when using CJ than when rating. Participants reported considering a broader range of writing features when rating than when using CJ.In Strand 2, think-aloud protocols were collected from eight participants while they were judging scripts with CJ. Findings indicated two approaches to the CJ decision-making process for EFL lower-secondary scripts—one reflecting a more traditional rating approach and the other a quicker, reliable approach tailored to CJ.Overall, this thesis suggests that CJ can be a reliable and time-efficient evaluation method for EFL writing when used by trained raters and/or teachers who have experience of evaluating written performances in the classroom.",

author = "Beci Sickinger",

year = "2025",

doi = "10.17635/lancaster/thesis/2187",

language = "English",

publisher = "Lancaster University",

school = "Lancaster University",

}

RIS

TY - BOOK

T1 - An Exploration of Comparative Judgement for Evaluating Writing Performances of the Austrian Year 8 Test for English as a Foreign Language

AU - Sickinger, Beci

PY - 2025

Y1 - 2025

N2 - Comparative judgement (CJ) is an evaluation method whereby a rank order is constructed from judges’ pairwise comparisons of performances. CJ has been shown to be reliable and practical in various contexts, but it is currently under-researched and under-utilised in second language (L2) language testing and as a method to evaluate performance dimensions/criteria independently. The present thesis investigated the use of CJ for the evaluation of lower-secondary school English as a Foreign Language (EFL) written performances from a national test in Austria.The study used a mixed-methods research design and consisted of two strands. In Strand 1, 27 participants (Austrian EFL educators) evaluated 300 EFL scripts using CJ: once holistically (judging all aspects of one script against all aspects of a second script) and once by each of a set of dimensions/criteria (judging the features of one performance dimension/criterion for one script against the equivalent features in a second script). Additionally, the participants rated the scripts using an analytic rating scale (the conventional rating method).CJ was found to be a reliable method of evaluating EFL scripts in all judgement sessions (scale separation reliability, the CJ measure taken over from Rasch modelling, ≥ .89). Experienced teachers with experience of evaluating similar scripts were reliable judges (infit values ≤ 1.5) and more reliable when using CJ than when rating. Participants reported considering a broader range of writing features when rating than when using CJ.In Strand 2, think-aloud protocols were collected from eight participants while they were judging scripts with CJ. Findings indicated two approaches to the CJ decision-making process for EFL lower-secondary scripts—one reflecting a more traditional rating approach and the other a quicker, reliable approach tailored to CJ.Overall, this thesis suggests that CJ can be a reliable and time-efficient evaluation method for EFL writing when used by trained raters and/or teachers who have experience of evaluating written performances in the classroom.

AB - Comparative judgement (CJ) is an evaluation method whereby a rank order is constructed from judges’ pairwise comparisons of performances. CJ has been shown to be reliable and practical in various contexts, but it is currently under-researched and under-utilised in second language (L2) language testing and as a method to evaluate performance dimensions/criteria independently. The present thesis investigated the use of CJ for the evaluation of lower-secondary school English as a Foreign Language (EFL) written performances from a national test in Austria.The study used a mixed-methods research design and consisted of two strands. In Strand 1, 27 participants (Austrian EFL educators) evaluated 300 EFL scripts using CJ: once holistically (judging all aspects of one script against all aspects of a second script) and once by each of a set of dimensions/criteria (judging the features of one performance dimension/criterion for one script against the equivalent features in a second script). Additionally, the participants rated the scripts using an analytic rating scale (the conventional rating method).CJ was found to be a reliable method of evaluating EFL scripts in all judgement sessions (scale separation reliability, the CJ measure taken over from Rasch modelling, ≥ .89). Experienced teachers with experience of evaluating similar scripts were reliable judges (infit values ≤ 1.5) and more reliable when using CJ than when rating. Participants reported considering a broader range of writing features when rating than when using CJ.In Strand 2, think-aloud protocols were collected from eight participants while they were judging scripts with CJ. Findings indicated two approaches to the CJ decision-making process for EFL lower-secondary scripts—one reflecting a more traditional rating approach and the other a quicker, reliable approach tailored to CJ.Overall, this thesis suggests that CJ can be a reliable and time-efficient evaluation method for EFL writing when used by trained raters and/or teachers who have experience of evaluating written performances in the classroom.

U2 - 10.17635/lancaster/thesis/2187

DO - 10.17635/lancaster/thesis/2187

M3 - Doctoral Thesis

PB - Lancaster University

ER -

Research

Electronic data

Text available via DOI: