Home > Research > Publications & Outputs > Igbo Diacritic Restoration using Embedding Models.

Text available via DOI:

View graph of relations

Igbo Diacritic Restoration using Embedding Models.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Igbo Diacritic Restoration using Embedding Models. / Ezeani, Ignatius; Hepple, Mark; Onyenwe, Ikechukwu E. et al.
NAACL-HLT (Student Research Workshop). 2018. p. 54-60.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Ezeani, I, Hepple, M, Onyenwe, IE & Chioma, E 2018, Igbo Diacritic Restoration using Embedding Models. in NAACL-HLT (Student Research Workshop). pp. 54-60. https://doi.org/10.18653/v1/n18-4008

APA

Ezeani, I., Hepple, M., Onyenwe, I. E., & Chioma, E. (2018). Igbo Diacritic Restoration using Embedding Models. In NAACL-HLT (Student Research Workshop) (pp. 54-60) https://doi.org/10.18653/v1/n18-4008

Vancouver

Ezeani I, Hepple M, Onyenwe IE, Chioma E. Igbo Diacritic Restoration using Embedding Models. In NAACL-HLT (Student Research Workshop). 2018. p. 54-60 doi: 10.18653/v1/n18-4008

Author

Ezeani, Ignatius ; Hepple, Mark ; Onyenwe, Ikechukwu E. et al. / Igbo Diacritic Restoration using Embedding Models. NAACL-HLT (Student Research Workshop). 2018. pp. 54-60

Bibtex

@inproceedings{29c92e4b8a1c4243a81f4cb7019c9232,
title = "Igbo Diacritic Restoration using Embedding Models.",
abstract = "Igbo is a low-resource language spoken by approximately 30 million people worldwide. It is the native language of the Igbo people of south-eastern Nigeria. In Igbo language, diacritics - orthographic and tonal - play a huge role in the distinguishing the meaning and pronunciation of words. Omitting diacritics in texts often leads to lexical ambiguity. Diacritic restoration is a pre-processing task that replaces missing diacritics on words from which they have been removed. In this work, we applied embedding models to the diacritic restoration task and compared their performances to those of n-gram models. Although word embedding models have been successfully applied to various NLP tasks, it has not been used, to our knowledge, for diacritic restoration. Two classes of word embeddings models were used: those projected from the English embedding space; and those trained with Igbo bible corpus (≈ 1m). Our best result, 82.49%, is an improvement on the baseline n-gram models.",
author = "Ignatius Ezeani and Mark Hepple and Onyenwe, {Ikechukwu E.} and Enemouh Chioma",
note = "DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.",
year = "2018",
month = jun,
doi = "10.18653/v1/n18-4008",
language = "English",
pages = "54--60",
booktitle = "NAACL-HLT (Student Research Workshop)",

}

RIS

TY - GEN

T1 - Igbo Diacritic Restoration using Embedding Models.

AU - Ezeani, Ignatius

AU - Hepple, Mark

AU - Onyenwe, Ikechukwu E.

AU - Chioma, Enemouh

N1 - DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

PY - 2018/6

Y1 - 2018/6

N2 - Igbo is a low-resource language spoken by approximately 30 million people worldwide. It is the native language of the Igbo people of south-eastern Nigeria. In Igbo language, diacritics - orthographic and tonal - play a huge role in the distinguishing the meaning and pronunciation of words. Omitting diacritics in texts often leads to lexical ambiguity. Diacritic restoration is a pre-processing task that replaces missing diacritics on words from which they have been removed. In this work, we applied embedding models to the diacritic restoration task and compared their performances to those of n-gram models. Although word embedding models have been successfully applied to various NLP tasks, it has not been used, to our knowledge, for diacritic restoration. Two classes of word embeddings models were used: those projected from the English embedding space; and those trained with Igbo bible corpus (≈ 1m). Our best result, 82.49%, is an improvement on the baseline n-gram models.

AB - Igbo is a low-resource language spoken by approximately 30 million people worldwide. It is the native language of the Igbo people of south-eastern Nigeria. In Igbo language, diacritics - orthographic and tonal - play a huge role in the distinguishing the meaning and pronunciation of words. Omitting diacritics in texts often leads to lexical ambiguity. Diacritic restoration is a pre-processing task that replaces missing diacritics on words from which they have been removed. In this work, we applied embedding models to the diacritic restoration task and compared their performances to those of n-gram models. Although word embedding models have been successfully applied to various NLP tasks, it has not been used, to our knowledge, for diacritic restoration. Two classes of word embeddings models were used: those projected from the English embedding space; and those trained with Igbo bible corpus (≈ 1m). Our best result, 82.49%, is an improvement on the baseline n-gram models.

U2 - 10.18653/v1/n18-4008

DO - 10.18653/v1/n18-4008

M3 - Conference contribution/Paper

SP - 54

EP - 60

BT - NAACL-HLT (Student Research Workshop)

ER -