Toward an effective Igbo part-of-speech tagger

Computing and Communications

Associated organisational units

Electronic data

towards_effective_Igbo_pos_tagger
Rights statement: © ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Asian and Low-Resource Language Information Processing, 18, 4, 2019 http://doi.acm.org/10.1145/3314942
Accepted author manuscript, 914 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1145/3314942
Final published version

Keywords

African language, Corpora, Corpus annotation, Igbo, Language technology, Machine learning, Morphological analysis, Natural language processing (NLP), Part-of-speech (POS) tagging, POS tagger, Tagset, Text processing

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Toward an effective Igbo part-of-speech tagger. / Onyenwe, Ikechukwu E.; Hepple, Mark; Chinedu, Uchechukwu et al.
In: ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Vol. 18, No. 4, 42, 31.08.2019.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Onyenwe, IE, Hepple, M, Chinedu, U & Ezeani, I 2019, 'Toward an effective Igbo part-of-speech tagger', ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 18, no. 4, 42. https://doi.org/10.1145/3314942

APA

Onyenwe, I. E., Hepple, M., Chinedu, U., & Ezeani, I. (2019). Toward an effective Igbo part-of-speech tagger. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(4), Article 42. https://doi.org/10.1145/3314942

Vancouver

Onyenwe IE, Hepple M, Chinedu U, Ezeani I. Toward an effective Igbo part-of-speech tagger. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). 2019 Aug 31;18(4):42. Epub 2019 May 21. doi: 10.1145/3314942

Author

Onyenwe, Ikechukwu E. ; Hepple, Mark ; Chinedu, Uchechukwu et al. / Toward an effective Igbo part-of-speech tagger. In: ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). 2019 ; Vol. 18, No. 4.

Bibtex

@article{547557ba5833425f82f0f917f0c1c34d,

title = "Toward an effective Igbo part-of-speech tagger",

abstract = "Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments conducted using an Igbo corpus as a test bed for identifying the POS taggers and the Machine Learning (ML) methods that can achieve a good performance with the small dataset available for the language. Experiments have been conducted using different well-known POS taggers developed for English or European languages, and different training data styles and sizes. Igbo has a number of language-specific characteristics that present a challenge for effective POS tagging. One interesting case is the wide use of verbs (and nominalizations thereof) that have an inherent noun complement, which form “linked pairs” in the POS tagging scheme, but which may appear discontinuously. Another issue is Igbo's highly productive agglutinative morphology, which can produce many variant word forms from a given root. This productivity is a key cause of the out-of-vocabulary (OOV) words observed during Igbo tagging. We report results of experiments on a promising direction for improving tagging performance on such morphologically-inflected OOV words.",

keywords = "African language, Corpora, Corpus annotation, Igbo, Language technology, Machine learning, Morphological analysis, Natural language processing (NLP), Part-of-speech (POS) tagging, POS tagger, Tagset, Text processing",

author = "Onyenwe, {Ikechukwu E.} and Mark Hepple and Uchechukwu Chinedu and Ignatius Ezeani",

note = "{\textcopyright} ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Asian and Low-Resource Language Information Processing, 18, 4, 2019 http://doi.acm.org/10.1145/3314942",

year = "2019",

month = aug,

day = "31",

doi = "10.1145/3314942",

language = "English",

volume = "18",

journal = "ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)",

issn = "2375-4699",

publisher = "Association for Computing Machinery (ACM)",

number = "4",

}

RIS

TY - JOUR

T1 - Toward an effective Igbo part-of-speech tagger

AU - Onyenwe, Ikechukwu E.

AU - Hepple, Mark

AU - Chinedu, Uchechukwu

AU - Ezeani, Ignatius

N1 - © ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Asian and Low-Resource Language Information Processing, 18, 4, 2019 http://doi.acm.org/10.1145/3314942

PY - 2019/8/31

Y1 - 2019/8/31

N2 - Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments conducted using an Igbo corpus as a test bed for identifying the POS taggers and the Machine Learning (ML) methods that can achieve a good performance with the small dataset available for the language. Experiments have been conducted using different well-known POS taggers developed for English or European languages, and different training data styles and sizes. Igbo has a number of language-specific characteristics that present a challenge for effective POS tagging. One interesting case is the wide use of verbs (and nominalizations thereof) that have an inherent noun complement, which form “linked pairs” in the POS tagging scheme, but which may appear discontinuously. Another issue is Igbo's highly productive agglutinative morphology, which can produce many variant word forms from a given root. This productivity is a key cause of the out-of-vocabulary (OOV) words observed during Igbo tagging. We report results of experiments on a promising direction for improving tagging performance on such morphologically-inflected OOV words.

AB - Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments conducted using an Igbo corpus as a test bed for identifying the POS taggers and the Machine Learning (ML) methods that can achieve a good performance with the small dataset available for the language. Experiments have been conducted using different well-known POS taggers developed for English or European languages, and different training data styles and sizes. Igbo has a number of language-specific characteristics that present a challenge for effective POS tagging. One interesting case is the wide use of verbs (and nominalizations thereof) that have an inherent noun complement, which form “linked pairs” in the POS tagging scheme, but which may appear discontinuously. Another issue is Igbo's highly productive agglutinative morphology, which can produce many variant word forms from a given root. This productivity is a key cause of the out-of-vocabulary (OOV) words observed during Igbo tagging. We report results of experiments on a promising direction for improving tagging performance on such morphologically-inflected OOV words.

KW - African language

KW - Corpora

KW - Corpus annotation

KW - Igbo

KW - Language technology

KW - Machine learning

KW - Morphological analysis

KW - Natural language processing (NLP)

KW - Part-of-speech (POS) tagging

KW - POS tagger

KW - Tagset

KW - Text processing

U2 - 10.1145/3314942

DO - 10.1145/3314942

M3 - Journal article

AN - SCOPUS:85073211790

VL - 18

JO - ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)

JF - ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)

SN - 2375-4699

IS - 4

M1 - 42

ER -

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Keywords