Home > Research > Publications & Outputs > Ensemble Named Entity Recognition (NER)

Electronic data

  • Won_Murrieta_Martins_2018

    Final published version, 3.22 MB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Text available via DOI:

View graph of relations

Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora. / Murrieta-Flores, Patricia.
In: Frontiers in Digital Humanities, Vol. 5, 2, 09.03.2018.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

Author

Bibtex

@article{4cf80c223d4246f78cc8a200112f4f9b,
title = "Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora",
abstract = "The field of Spatial Humanities has advanced substantially in the past years. The identification and extraction of toponyms and spatial information mentioned in historical text collections has allowed its use in innovative ways, making possible the application of spatial analysis and the mapping of these places with geographic information systems. For instance, automated place name identification is possible with Named Entity Recognition (NER) systems. Statistical NER methods based on supervised learning, in particular, are highly successful with modern datasets. However, there are still major challenges to address when dealing with historical corpora. These challenges include language changes over time, spelling variations, transliterations, OCR errors, and sources written in multiple languages among others. In this article, considering a task of place name recognition over two collections of historical correspondence, we report an evaluation of five NER systems and an approach that combines these through a voting system. We found that although individual performance of each NER system was corpus dependent, the ensemble combination was able to achieve consistent measures of precision and recall, outperforming the individual NER systems. In addition, the results showed that these NER systems are not strongly dependent on preprocessing and translation to Modern English.",
keywords = "Spatial Humanities, Digital Humanities, Natural Language processing, named entity recognition, history, Early Modern English, early modern history, Republic of Letters, toponym recognition",
author = "Patricia Murrieta-Flores",
year = "2018",
month = mar,
day = "9",
doi = "10.3389/fdigh.2018.00002",
language = "English",
volume = "5",
journal = "Frontiers in Digital Humanities",
issn = "2297-2668",
publisher = "Frontiers Media",

}

RIS

TY - JOUR

T1 - Ensemble Named Entity Recognition (NER)

T2 - Evaluating NER Tools in the Identification of Place Names in Historical Corpora

AU - Murrieta-Flores, Patricia

PY - 2018/3/9

Y1 - 2018/3/9

N2 - The field of Spatial Humanities has advanced substantially in the past years. The identification and extraction of toponyms and spatial information mentioned in historical text collections has allowed its use in innovative ways, making possible the application of spatial analysis and the mapping of these places with geographic information systems. For instance, automated place name identification is possible with Named Entity Recognition (NER) systems. Statistical NER methods based on supervised learning, in particular, are highly successful with modern datasets. However, there are still major challenges to address when dealing with historical corpora. These challenges include language changes over time, spelling variations, transliterations, OCR errors, and sources written in multiple languages among others. In this article, considering a task of place name recognition over two collections of historical correspondence, we report an evaluation of five NER systems and an approach that combines these through a voting system. We found that although individual performance of each NER system was corpus dependent, the ensemble combination was able to achieve consistent measures of precision and recall, outperforming the individual NER systems. In addition, the results showed that these NER systems are not strongly dependent on preprocessing and translation to Modern English.

AB - The field of Spatial Humanities has advanced substantially in the past years. The identification and extraction of toponyms and spatial information mentioned in historical text collections has allowed its use in innovative ways, making possible the application of spatial analysis and the mapping of these places with geographic information systems. For instance, automated place name identification is possible with Named Entity Recognition (NER) systems. Statistical NER methods based on supervised learning, in particular, are highly successful with modern datasets. However, there are still major challenges to address when dealing with historical corpora. These challenges include language changes over time, spelling variations, transliterations, OCR errors, and sources written in multiple languages among others. In this article, considering a task of place name recognition over two collections of historical correspondence, we report an evaluation of five NER systems and an approach that combines these through a voting system. We found that although individual performance of each NER system was corpus dependent, the ensemble combination was able to achieve consistent measures of precision and recall, outperforming the individual NER systems. In addition, the results showed that these NER systems are not strongly dependent on preprocessing and translation to Modern English.

KW - Spatial Humanities

KW - Digital Humanities

KW - Natural Language processing

KW - named entity recognition

KW - history

KW - Early Modern English

KW - early modern history

KW - Republic of Letters

KW - toponym recognition

U2 - 10.3389/fdigh.2018.00002

DO - 10.3389/fdigh.2018.00002

M3 - Journal article

VL - 5

JO - Frontiers in Digital Humanities

JF - Frontiers in Digital Humanities

SN - 2297-2668

M1 - 2

ER -