Automatic Restoration of Diacritics for Igbo Language

Computing and Communications

Associated organisational unit

UCREL - University Centre for Computer Corpus Research on Language

Text available via DOI:

https://doi.org/10.1007/978-3-319-45510-5
Final published version

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Ignatius Ezeani
Mark Hepple
Ikechukwu Onyenwe
Petr Sojka (Editor)
Aleš Horák (Editor)
Ivan Kopeček (Editor)
Karel Pala (Editor)

More...

Publication date	1/01/2016
Host publication	International Conference on Text, Speech, and Dialogue: TSD 2016: Text, Speech, and Dialogue
Publisher	Springer
Pages	198-205
Number of pages	8
ISBN (print)	9783319455099
<mark>Original language</mark>	English

Abstract

Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distinctions between words that are important for both meaning and pronunciation, and hence of potential value for a range of language processing tasks. Such diacritics, however, are often largely absent from the electronic texts we might want to process, or assemble into corpora, and so the need arises for effective methods for automatic diacritic restoration for Igbo. In this paper, we experiment using an Igbo bible corpus, which is extensively marked for vowel distinctions, and partially for tonal distinctions, and attempt the task of reinstating these diacritics when they have been deleted. We investigate a number of word-level diacritic restoration methods, based on n-grams, under a closed-world assumption, achieving an accuracy of 98.83 % with our most effective method.

Research

Associated organisational unit

Links

Text available via DOI:

Automatic Restoration of Diacritics for Igbo Language

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us