Home > Research > Publications & Outputs > Automatic Restoration of Diacritics for Igbo La...

Links

Text available via DOI:

View graph of relations

Automatic Restoration of Diacritics for Igbo Language

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Automatic Restoration of Diacritics for Igbo Language. / Ezeani, Ignatius; Hepple, Mark; Onyenwe, Ikechukwu; Sojka, Petr (Editor); Horák, Aleš (Editor); Kopeček, Ivan (Editor); Pala, Karel (Editor).

International Conference on Text, Speech, and Dialogue: TSD 2016: Text, Speech, and Dialogue. Springer, 2016. p. 198-205.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Ezeani, I, Hepple, M, Onyenwe, I, Sojka, P (ed.), Horák, A (ed.), Kopeček, I (ed.) & Pala, K (ed.) 2016, Automatic Restoration of Diacritics for Igbo Language. in International Conference on Text, Speech, and Dialogue: TSD 2016: Text, Speech, and Dialogue. Springer, pp. 198-205. https://doi.org/10.1007/978-3-319-45510-5

APA

Ezeani, I., Hepple, M., Onyenwe, I., Sojka, P. (Ed.), Horák, A. (Ed.), Kopeček, I. (Ed.), & Pala, K. (Ed.) (2016). Automatic Restoration of Diacritics for Igbo Language. In International Conference on Text, Speech, and Dialogue: TSD 2016: Text, Speech, and Dialogue (pp. 198-205). Springer. https://doi.org/10.1007/978-3-319-45510-5

Vancouver

Ezeani I, Hepple M, Onyenwe I, Sojka P, (ed.), Horák A, (ed.), Kopeček I, (ed.) et al. Automatic Restoration of Diacritics for Igbo Language. In International Conference on Text, Speech, and Dialogue: TSD 2016: Text, Speech, and Dialogue. Springer. 2016. p. 198-205 https://doi.org/10.1007/978-3-319-45510-5

Author

Ezeani, Ignatius ; Hepple, Mark ; Onyenwe, Ikechukwu ; Sojka, Petr (Editor) ; Horák, Aleš (Editor) ; Kopeček, Ivan (Editor) ; Pala, Karel (Editor). / Automatic Restoration of Diacritics for Igbo Language. International Conference on Text, Speech, and Dialogue: TSD 2016: Text, Speech, and Dialogue. Springer, 2016. pp. 198-205

Bibtex

@inproceedings{602b6c14f5a146298fcbd571f99f40ba,
title = "Automatic Restoration of Diacritics for Igbo Language",
abstract = "Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distinctions between words that are important for both meaning and pronunciation, and hence of potential value for a range of language processing tasks. Such diacritics, however, are often largely absent from the electronic texts we might want to process, or assemble into corpora, and so the need arises for effective methods for automatic diacritic restoration for Igbo. In this paper, we experiment using an Igbo bible corpus, which is extensively marked for vowel distinctions, and partially for tonal distinctions, and attempt the task of reinstating these diacritics when they have been deleted. We investigate a number of word-level diacritic restoration methods, based on n-grams, under a closed-world assumption, achieving an accuracy of 98.83 % with our most effective method.",
author = "Ignatius Ezeani and Mark Hepple and Ikechukwu Onyenwe and Petr Sojka and Ale{\v s} Hor{\'a}k and Ivan Kope{\v c}ek and Karel Pala",
year = "2016",
month = jan,
day = "1",
doi = "10.1007/978-3-319-45510-5",
language = "English",
isbn = "9783319455099",
pages = "198--205",
booktitle = "International Conference on Text, Speech, and Dialogue",
publisher = "Springer",

}

RIS

TY - GEN

T1 - Automatic Restoration of Diacritics for Igbo Language

AU - Ezeani, Ignatius

AU - Hepple, Mark

AU - Onyenwe, Ikechukwu

A2 - Sojka, Petr

A2 - Horák, Aleš

A2 - Kopeček, Ivan

A2 - Pala, Karel

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distinctions between words that are important for both meaning and pronunciation, and hence of potential value for a range of language processing tasks. Such diacritics, however, are often largely absent from the electronic texts we might want to process, or assemble into corpora, and so the need arises for effective methods for automatic diacritic restoration for Igbo. In this paper, we experiment using an Igbo bible corpus, which is extensively marked for vowel distinctions, and partially for tonal distinctions, and attempt the task of reinstating these diacritics when they have been deleted. We investigate a number of word-level diacritic restoration methods, based on n-grams, under a closed-world assumption, achieving an accuracy of 98.83 % with our most effective method.

AB - Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distinctions between words that are important for both meaning and pronunciation, and hence of potential value for a range of language processing tasks. Such diacritics, however, are often largely absent from the electronic texts we might want to process, or assemble into corpora, and so the need arises for effective methods for automatic diacritic restoration for Igbo. In this paper, we experiment using an Igbo bible corpus, which is extensively marked for vowel distinctions, and partially for tonal distinctions, and attempt the task of reinstating these diacritics when they have been deleted. We investigate a number of word-level diacritic restoration methods, based on n-grams, under a closed-world assumption, achieving an accuracy of 98.83 % with our most effective method.

U2 - 10.1007/978-3-319-45510-5

DO - 10.1007/978-3-319-45510-5

M3 - Conference contribution/Paper

SN - 9783319455099

SP - 198

EP - 205

BT - International Conference on Text, Speech, and Dialogue

PB - Springer

ER -