Toponym matching through deep neural networks

History

Electronic data

Manusc_Toponym_Matching_Through_Deep_Neural_Networks
Rights statement: This is an Accepted Manuscript of an article published by Taylor & Francis in International Journal of Geographical Information Systems on 31/10/2017, available online: http://www.tandfonline.com/10.1080/13658816.2017.1390119
Accepted author manuscript, 4.39 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1080/13658816.2017.1390119
Final published version

Keywords

approximate string matching, deep neural networks, duplicate detection, geographic information retrieval, recurrent neural networks, Toponym matching

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Toponym matching through deep neural networks. / Santos, Rui; Murrieta-Flores, Patricia; Calado, Pável et al.
In: International Journal of Geographical Information Science, Vol. 32, No. 2, 2017, p. 324-348.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Santos, R, Murrieta-Flores, P, Calado, P & Martins, B 2017, 'Toponym matching through deep neural networks', International Journal of Geographical Information Science, vol. 32, no. 2, pp. 324-348. https://doi.org/10.1080/13658816.2017.1390119

APA

Santos, R., Murrieta-Flores, P., Calado, P., & Martins, B. (2017). Toponym matching through deep neural networks. International Journal of Geographical Information Science, 32(2), 324-348. https://doi.org/10.1080/13658816.2017.1390119

Vancouver

Santos R, Murrieta-Flores P, Calado P, Martins B. Toponym matching through deep neural networks. International Journal of Geographical Information Science. 2017;32(2):324-348. Epub 2017 Oct 31. doi: 10.1080/13658816.2017.1390119

Author

Santos, Rui ; Murrieta-Flores, Patricia ; Calado, Pável et al. / Toponym matching through deep neural networks. In: International Journal of Geographical Information Science. 2017 ; Vol. 32, No. 2. pp. 324-348.

Bibtex

@article{fb3ee85bb72549248d9ee3410829a913,

title = "Toponym matching through deep neural networks",

abstract = "Toponym matching, i.e. pairing strings that represent the same real-world location, is a fundamental problemfor several practical applications. The current state-of-the-art relies on string similarity metrics, either specifically developed for matching place names or integrated within methods that combine multiple metrics. However, these methods all rely on common sub-strings in order to establish similarity, and they do not effectively capture the character replacements involved in toponym changes due to transliterations or to changes in language and culture over time. In this article, we present a novel matching approach, leveraging a deep neural network to classify pairs of toponyms as either matching or nonmatching. The proposed network architecture uses recurrent nodes to build representations from the sequences of bytes that correspond to the strings that are to be matched. These representations are then combined and passed to feed-forward nodes, finally leading to a classification decision. We present the results of a wide-ranging evaluation on the performance of the proposed method, using a large dataset collected from the GeoNames gazetteer. These results show that the proposed method can significantly outperform individual similarity metrics from previous studies, as well as previous methods based on supervised machine learning for combining multiple metrics.",

keywords = "approximate string matching, deep neural networks, duplicate detection, geographic information retrieval, recurrent neural networks, Toponym matching",

author = "Rui Santos and Patricia Murrieta-Flores and P{\'a}vel Calado and Bruno Martins",

note = "This is an Accepted Manuscript of an article published by Taylor & Francis in International Journal of Geographical Information Systems on 31/10/2017, available online: http://www.tandfonline.com/10.1080/13658816.2017.1390119",

year = "2017",

doi = "10.1080/13658816.2017.1390119",

language = "English",

volume = "32",

pages = "324--348",

journal = "International Journal of Geographical Information Science",

issn = "1365-8816",

publisher = "Taylor and Francis Ltd.",

number = "2",

}

RIS

TY - JOUR

T1 - Toponym matching through deep neural networks

AU - Santos, Rui

AU - Murrieta-Flores, Patricia

AU - Calado, Pável

AU - Martins, Bruno

N1 - This is an Accepted Manuscript of an article published by Taylor & Francis in International Journal of Geographical Information Systems on 31/10/2017, available online: http://www.tandfonline.com/10.1080/13658816.2017.1390119

PY - 2017

Y1 - 2017

N2 - Toponym matching, i.e. pairing strings that represent the same real-world location, is a fundamental problemfor several practical applications. The current state-of-the-art relies on string similarity metrics, either specifically developed for matching place names or integrated within methods that combine multiple metrics. However, these methods all rely on common sub-strings in order to establish similarity, and they do not effectively capture the character replacements involved in toponym changes due to transliterations or to changes in language and culture over time. In this article, we present a novel matching approach, leveraging a deep neural network to classify pairs of toponyms as either matching or nonmatching. The proposed network architecture uses recurrent nodes to build representations from the sequences of bytes that correspond to the strings that are to be matched. These representations are then combined and passed to feed-forward nodes, finally leading to a classification decision. We present the results of a wide-ranging evaluation on the performance of the proposed method, using a large dataset collected from the GeoNames gazetteer. These results show that the proposed method can significantly outperform individual similarity metrics from previous studies, as well as previous methods based on supervised machine learning for combining multiple metrics.

AB - Toponym matching, i.e. pairing strings that represent the same real-world location, is a fundamental problemfor several practical applications. The current state-of-the-art relies on string similarity metrics, either specifically developed for matching place names or integrated within methods that combine multiple metrics. However, these methods all rely on common sub-strings in order to establish similarity, and they do not effectively capture the character replacements involved in toponym changes due to transliterations or to changes in language and culture over time. In this article, we present a novel matching approach, leveraging a deep neural network to classify pairs of toponyms as either matching or nonmatching. The proposed network architecture uses recurrent nodes to build representations from the sequences of bytes that correspond to the strings that are to be matched. These representations are then combined and passed to feed-forward nodes, finally leading to a classification decision. We present the results of a wide-ranging evaluation on the performance of the proposed method, using a large dataset collected from the GeoNames gazetteer. These results show that the proposed method can significantly outperform individual similarity metrics from previous studies, as well as previous methods based on supervised machine learning for combining multiple metrics.

KW - approximate string matching

KW - deep neural networks

KW - duplicate detection

KW - geographic information retrieval

KW - recurrent neural networks

KW - Toponym matching

U2 - 10.1080/13658816.2017.1390119

DO - 10.1080/13658816.2017.1390119

M3 - Journal article

AN - SCOPUS:85032681752

VL - 32

SP - 324

EP - 348

JO - International Journal of Geographical Information Science

JF - International Journal of Geographical Information Science

SN - 1365-8816

IS - 2

ER -

Research

Electronic data

Links

Text available via DOI:

Keywords