Home > Research > Publications & Outputs > Alts, Abbreviations, and AKAs

Electronic data

  • Butler_et_al_JMGL_Finalsub

    Rights statement: This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Map and Geography Libraries on 11/05/2017, available online: http://www.tandfonline.com/10.1080/15420353.2017.1307304

    Accepted author manuscript, 318 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

Alts, Abbreviations, and AKAs: historical onomastic variation and automated named entity recognition

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Alts, Abbreviations, and AKAs: historical onomastic variation and automated named entity recognition. / Butler, James Odelle; Donaldson, Christopher Elliott; Taylor, Joanna Elizabeth et al.
In: Journal of Map and Geography Libraries , Vol. 13, No. 1, 30.05.2017, p. 58-81.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

Butler JO, Donaldson CE, Taylor JE, Gregory IN. Alts, Abbreviations, and AKAs: historical onomastic variation and automated named entity recognition. Journal of Map and Geography Libraries . 2017 May 30;13(1):58-81. Epub 2017 May 11. doi: 10.1080/15420353.2017.1307304

Author

Bibtex

@article{6351a7e67a714bfcac5b044b14e001d0,
title = "Alts, Abbreviations, and AKAs: historical onomastic variation and automated named entity recognition",
abstract = "The accurate automated identification of named places is a major concern for scholars in the digital humanities, and especially for those engaged in research that depends upon the gazetteer-led recognition of specific aspects. The field of onomastics examines the linguistic roots and historical development of names, which have for the most part only standardised into single officially recognised forms since the late nineteenth century. Even slight spelling variations can introduce errors in geotagging techniques, and these differences in place-name spellings are thus vital considerations when seeking high rates of correct geospatial identification in historical texts. This article offers an overview of typical name-based variation that can cause issues in the accurate geotagging of any historical resource. The article argues that the careful study and documentation of these variations can assist in the development of more complete onymic records, which in turn may inform geotaggers through a cycle of variational recognition. It demonstrates how patterns in regional naming variation and development, across both specific and generic name elements, can be identified through the historical records of each known location. The article uses examples taken from a digitised corpus of writing about the English Lake District, a collection of 80 texts that date from between 1622 and 1900. Four of the more complex spelling-based problems encountered during the creation of a manual gazetteer for this corpus are examined. Specifically, the article demonstrates how and why such variation must be expected, particularly in the years preceding the standardisation of place-name spellings. It suggests how procedural developments may be undertaken to account for such georeferential issues in the Named Entity Recognition strategies employed by future projects. Similarly, the benefits of such multi-genre corpora to assist in completing onomastic records is also shown through examples of new name forms discovered for prominent sites in the Lake District. This focus is accompanied by a discussion of the influence of literary works on place-name standardisation – an aspect not typically accounted for in traditional onomastic study – to illustrate the extent to which authorial interests in regional toponymic histories can influence linguistic development. ",
keywords = "name studies, historical linguistics, GIS, gazetteers, language variation, environmental analysis, linguistic studies",
author = "Butler, {James Odelle} and Donaldson, {Christopher Elliott} and Taylor, {Joanna Elizabeth} and Gregory, {Ian Norman}",
note = "This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Map and Geography Libraries on 11/05/2017, available online: http://www.tandfonline.com/10.1080/15420353.2017.1307304",
year = "2017",
month = may,
day = "30",
doi = "10.1080/15420353.2017.1307304",
language = "English",
volume = "13",
pages = "58--81",
journal = "Journal of Map and Geography Libraries ",
issn = "1542-0353",
publisher = "Taylor & Francis",
number = "1",

}

RIS

TY - JOUR

T1 - Alts, Abbreviations, and AKAs

T2 - historical onomastic variation and automated named entity recognition

AU - Butler, James Odelle

AU - Donaldson, Christopher Elliott

AU - Taylor, Joanna Elizabeth

AU - Gregory, Ian Norman

N1 - This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Map and Geography Libraries on 11/05/2017, available online: http://www.tandfonline.com/10.1080/15420353.2017.1307304

PY - 2017/5/30

Y1 - 2017/5/30

N2 - The accurate automated identification of named places is a major concern for scholars in the digital humanities, and especially for those engaged in research that depends upon the gazetteer-led recognition of specific aspects. The field of onomastics examines the linguistic roots and historical development of names, which have for the most part only standardised into single officially recognised forms since the late nineteenth century. Even slight spelling variations can introduce errors in geotagging techniques, and these differences in place-name spellings are thus vital considerations when seeking high rates of correct geospatial identification in historical texts. This article offers an overview of typical name-based variation that can cause issues in the accurate geotagging of any historical resource. The article argues that the careful study and documentation of these variations can assist in the development of more complete onymic records, which in turn may inform geotaggers through a cycle of variational recognition. It demonstrates how patterns in regional naming variation and development, across both specific and generic name elements, can be identified through the historical records of each known location. The article uses examples taken from a digitised corpus of writing about the English Lake District, a collection of 80 texts that date from between 1622 and 1900. Four of the more complex spelling-based problems encountered during the creation of a manual gazetteer for this corpus are examined. Specifically, the article demonstrates how and why such variation must be expected, particularly in the years preceding the standardisation of place-name spellings. It suggests how procedural developments may be undertaken to account for such georeferential issues in the Named Entity Recognition strategies employed by future projects. Similarly, the benefits of such multi-genre corpora to assist in completing onomastic records is also shown through examples of new name forms discovered for prominent sites in the Lake District. This focus is accompanied by a discussion of the influence of literary works on place-name standardisation – an aspect not typically accounted for in traditional onomastic study – to illustrate the extent to which authorial interests in regional toponymic histories can influence linguistic development.

AB - The accurate automated identification of named places is a major concern for scholars in the digital humanities, and especially for those engaged in research that depends upon the gazetteer-led recognition of specific aspects. The field of onomastics examines the linguistic roots and historical development of names, which have for the most part only standardised into single officially recognised forms since the late nineteenth century. Even slight spelling variations can introduce errors in geotagging techniques, and these differences in place-name spellings are thus vital considerations when seeking high rates of correct geospatial identification in historical texts. This article offers an overview of typical name-based variation that can cause issues in the accurate geotagging of any historical resource. The article argues that the careful study and documentation of these variations can assist in the development of more complete onymic records, which in turn may inform geotaggers through a cycle of variational recognition. It demonstrates how patterns in regional naming variation and development, across both specific and generic name elements, can be identified through the historical records of each known location. The article uses examples taken from a digitised corpus of writing about the English Lake District, a collection of 80 texts that date from between 1622 and 1900. Four of the more complex spelling-based problems encountered during the creation of a manual gazetteer for this corpus are examined. Specifically, the article demonstrates how and why such variation must be expected, particularly in the years preceding the standardisation of place-name spellings. It suggests how procedural developments may be undertaken to account for such georeferential issues in the Named Entity Recognition strategies employed by future projects. Similarly, the benefits of such multi-genre corpora to assist in completing onomastic records is also shown through examples of new name forms discovered for prominent sites in the Lake District. This focus is accompanied by a discussion of the influence of literary works on place-name standardisation – an aspect not typically accounted for in traditional onomastic study – to illustrate the extent to which authorial interests in regional toponymic histories can influence linguistic development.

KW - name studies

KW - historical linguistics

KW - GIS

KW - gazetteers

KW - language variation

KW - environmental analysis

KW - linguistic studies

U2 - 10.1080/15420353.2017.1307304

DO - 10.1080/15420353.2017.1307304

M3 - Journal article

VL - 13

SP - 58

EP - 81

JO - Journal of Map and Geography Libraries

JF - Journal of Map and Geography Libraries

SN - 1542-0353

IS - 1

ER -