Rights statement: This article has been accepted for publication in International Journal of Corpus Linguistics, Volume 13, Issue 4, 2008, pages: 519-549, © 2008 John Benjamins, the publisher should be contacted for permission to re-use the material in any form.
Accepted author manuscript, 640 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - From key words to key semantic domains
AU - Rayson, P.
N1 - This article has been accepted for publication in International Journal of Corpus Linguistics, Volume 13, Issue 4, 2008, pages: 519-549, © 2008 John Benjamins, the publisher should be contacted for permission to re-use the material in any form.
PY - 2008
Y1 - 2008
N2 - This paper reports the extension of the key words method for the comparison of corpora. Using automatic tagging software that assigns part-of-speech and semantic field (domain) tags, a method is described which permits the extraction of key domains by applying the keyness calculation to tag frequency lists. The combination of the key words and key domains methods is shown to allow macroscopic analysis (the study of the characteristics of whole texts or varieties of language) to inform the microscopic level (focussing on the use of a particular linguistic feature) and thereby suggesting those linguistic features which should be investigated further. The resulting 'data-driven' approach presented here combines elements of both the 'corpus-based' and 'corpus-driven' paradigms in corpus linguistics. A web-based tool, Wmatrix, implementing the proposed method is applied in a case study: the comparison of UK 2001 general election manifestos of the Labour and Liberal Democratic parties.
AB - This paper reports the extension of the key words method for the comparison of corpora. Using automatic tagging software that assigns part-of-speech and semantic field (domain) tags, a method is described which permits the extraction of key domains by applying the keyness calculation to tag frequency lists. The combination of the key words and key domains methods is shown to allow macroscopic analysis (the study of the characteristics of whole texts or varieties of language) to inform the microscopic level (focussing on the use of a particular linguistic feature) and thereby suggesting those linguistic features which should be investigated further. The resulting 'data-driven' approach presented here combines elements of both the 'corpus-based' and 'corpus-driven' paradigms in corpus linguistics. A web-based tool, Wmatrix, implementing the proposed method is applied in a case study: the comparison of UK 2001 general election manifestos of the Labour and Liberal Democratic parties.
KW - cs_eprint_id
KW - 2387 cs_uid
KW - 355
U2 - 10.1075/ijcl.13.4.06ray
DO - 10.1075/ijcl.13.4.06ray
M3 - Journal article
VL - 13
SP - 519
EP - 549
JO - International Journal of Corpus Linguistics
JF - International Journal of Corpus Linguistics
SN - 1569-9811
IS - 4
ER -