Home > Research > Publications & Outputs > Guidelines for normalising early modern English...

Electronic data

  • icame-2015-0001

    Rights statement: © 2015. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. (CC BY-NC-ND 3.0)

    Final published version, 471 KB, PDF document

    Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Links

Text available via DOI:

View graph of relations

Guidelines for normalising early modern English corpora: decisions and justifications

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Guidelines for normalising early modern English corpora: decisions and justifications. / Archer, Dawn; Kytö, Merja; Baron, Alistair et al.
In: ICAME Journal, Vol. 39, No. 1, 01.03.2015, p. 5-24.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

Archer D, Kytö M, Baron A, Rayson PE. Guidelines for normalising early modern English corpora: decisions and justifications. ICAME Journal. 2015 Mar 1;39(1):5-24. doi: 10.1515/icame-2015-0001

Author

Archer, Dawn ; Kytö, Merja ; Baron, Alistair et al. / Guidelines for normalising early modern English corpora : decisions and justifications. In: ICAME Journal. 2015 ; Vol. 39, No. 1. pp. 5-24.

Bibtex

@article{bb64abb49ab041a89e4d4b7602f500ec,
title = "Guidelines for normalising early modern English corpora: decisions and justifications",
abstract = "Corpora of Early Modern English have been collected and released for research for a number of years. With large scale digitisation activities gathering pace in the last decade, much more historical textual data is now available for research on numerous topics including historical linguistics and conceptual history. We summarise previous research which has shown that it is necessary to map historical spelling variants to modern equivalents in order to successfully apply natural language processing and corpus linguistics methods. Manual and semiautomatic methods have been devised to support this normalisation and standardisation process. We argue that it is important to develop a linguistically meaningful rationale to achieve good results from this process. In order to do so, we propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues.",
author = "Dawn Archer and Merja Kyt{\"o} and Alistair Baron and Rayson, {Paul Edward}",
note = "{\textcopyright} 2015. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. (CC BY-NC-ND 3.0)",
year = "2015",
month = mar,
day = "1",
doi = "10.1515/icame-2015-0001",
language = "English",
volume = "39",
pages = "5--24",
journal = "ICAME Journal",
issn = "1502-5462",
publisher = "Walter de Gruyter GmbH",
number = "1",

}

RIS

TY - JOUR

T1 - Guidelines for normalising early modern English corpora

T2 - decisions and justifications

AU - Archer, Dawn

AU - Kytö, Merja

AU - Baron, Alistair

AU - Rayson, Paul Edward

N1 - © 2015. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. (CC BY-NC-ND 3.0)

PY - 2015/3/1

Y1 - 2015/3/1

N2 - Corpora of Early Modern English have been collected and released for research for a number of years. With large scale digitisation activities gathering pace in the last decade, much more historical textual data is now available for research on numerous topics including historical linguistics and conceptual history. We summarise previous research which has shown that it is necessary to map historical spelling variants to modern equivalents in order to successfully apply natural language processing and corpus linguistics methods. Manual and semiautomatic methods have been devised to support this normalisation and standardisation process. We argue that it is important to develop a linguistically meaningful rationale to achieve good results from this process. In order to do so, we propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues.

AB - Corpora of Early Modern English have been collected and released for research for a number of years. With large scale digitisation activities gathering pace in the last decade, much more historical textual data is now available for research on numerous topics including historical linguistics and conceptual history. We summarise previous research which has shown that it is necessary to map historical spelling variants to modern equivalents in order to successfully apply natural language processing and corpus linguistics methods. Manual and semiautomatic methods have been devised to support this normalisation and standardisation process. We argue that it is important to develop a linguistically meaningful rationale to achieve good results from this process. In order to do so, we propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues.

U2 - 10.1515/icame-2015-0001

DO - 10.1515/icame-2015-0001

M3 - Journal article

VL - 39

SP - 5

EP - 24

JO - ICAME Journal

JF - ICAME Journal

SN - 1502-5462

IS - 1

ER -