Accepted author manuscript, 1.31 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - An automated approach for geocoding tabular itineraries
AU - Santos, Rui
AU - Murrieta-Flores, Patricia
AU - Martins, Bruno
PY - 2017/11/30
Y1 - 2017/11/30
N2 - Historical itineraries, often accessible as lists or tables describing places visited in sequence, are abundant resources and also important objects of study for humanities scholars. This article advances a novel method for automatically geocoding tabular itineraries, combining approximate string matching with a cost optimization algorithm based on dynamic programming. Experiments with a dataset of historical itineraries, with ground-truth geocoding annotations provided by domain experts and leveraging also the GeoNames gazetteer, attest to the effectiveness of the proposed method. The obtained results show that while approximate string matching can already achieve very low median errors, with many toponyms matching exactly against GeoNames entries, the combination with cost optimization can significantly improve results in terms of the average distance towards the correct disambiguations.
AB - Historical itineraries, often accessible as lists or tables describing places visited in sequence, are abundant resources and also important objects of study for humanities scholars. This article advances a novel method for automatically geocoding tabular itineraries, combining approximate string matching with a cost optimization algorithm based on dynamic programming. Experiments with a dataset of historical itineraries, with ground-truth geocoding annotations provided by domain experts and leveraging also the GeoNames gazetteer, attest to the effectiveness of the proposed method. The obtained results show that while approximate string matching can already achieve very low median errors, with many toponyms matching exactly against GeoNames entries, the combination with cost optimization can significantly improve results in terms of the average distance towards the correct disambiguations.
KW - Automated geocoding
KW - Digital humanities
KW - Dynamic programming
KW - Geographic information retrieval
KW - Toponym matching
U2 - 10.1145/3155902.3155908
DO - 10.1145/3155902.3155908
M3 - Conference contribution/Paper
BT - GIR'17 Proceedings of the 11th Workshop on Geographic Information Retrieval
PB - Association for Computing Machinery, Inc
CY - New York
T2 - 11th Workshop on Geographic Information Retrieval, GIR 2017
Y2 - 30 November 2017 through 1 December 2017
ER -