Final published version
Licence: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Supporting the corpus-based study of Shakespeare’s language
T2 - Enhancing a corpus of the First Folio
AU - Culpeper, Jonathan
AU - Hardie, Andrew
AU - Demmen, Jane
AU - Hughes, Jennifer
AU - Timperley, Matt
PY - 2021/5/1
Y1 - 2021/5/1
N2 - This article explores challenges in the corpus linguistic analysis of Shakes-peare’s language, and Early Modern English more generally, with particularfocus on elaborating possible solutions and the benefits they bring. An accountof work that took place within the Encyclopedia of Shakespeare’s LanguageProject (2016–2019) is given, which discusses the development of the project’sdata resources, specifically, the Enhanced Shakespearean Corpus. Topics cov-ered include the composition of the corpus and its subcomponents; the structureof the XML markup; the design of the extensive character metadata; and theword-level corpus annotation, including spelling regularisation, part-of-speechtagging, lemmatisation and semantic tagging. The challenges that arise fromeach of these undertakings are not exclusive to a corpus-based treatment ofShakespeare’s plays but it is in the context of Shakespeare’s language that theyare so severe as to seem almost insurmountable. The solutions developed for theEnhanced Shakespearean Corpus – often combining automated manipulationwith manual interventions, and always principled – offer a way through.
AB - This article explores challenges in the corpus linguistic analysis of Shakes-peare’s language, and Early Modern English more generally, with particularfocus on elaborating possible solutions and the benefits they bring. An accountof work that took place within the Encyclopedia of Shakespeare’s LanguageProject (2016–2019) is given, which discusses the development of the project’sdata resources, specifically, the Enhanced Shakespearean Corpus. Topics cov-ered include the composition of the corpus and its subcomponents; the structureof the XML markup; the design of the extensive character metadata; and theword-level corpus annotation, including spelling regularisation, part-of-speechtagging, lemmatisation and semantic tagging. The challenges that arise fromeach of these undertakings are not exclusive to a corpus-based treatment ofShakespeare’s plays but it is in the context of Shakespeare’s language that theyare so severe as to seem almost insurmountable. The solutions developed for theEnhanced Shakespearean Corpus – often combining automated manipulationwith manual interventions, and always principled – offer a way through.
KW - Corpus linguistics
KW - Shakespeare
KW - First Folio
U2 - 10.2478/icame-2021-0002
DO - 10.2478/icame-2021-0002
M3 - Journal article
VL - 45
SP - 37
EP - 86
JO - ICAME Journal
JF - ICAME Journal
SN - 1502-5462
IS - 1
ER -