Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - A basic language resource kit implementation for the IgboNLP project
AU - Onyenwe, Ikechukwu E.
AU - Hepple, Mark
AU - Chinedu, Uchechukwu
AU - Ezeani, Ignatius
PY - 2018/1/31
Y1 - 2018/1/31
N2 - Igbo, an African language with around 32 million speakers worldwide, is one of the many languages having few or none of the language processing resources needed for advanced language technology applications. In this article, we describe the approach taken to creating an initial set of resources for Igbo, including an electronic text corpus, a part-of-speech (POS) tagset, and a POS-tagged subcorpus. We discuss the approach taken in gathering texts, the preprocessing of these texts, and the development of the POS tagged corpus. We also discuss some of the problems encountered during corpus and tagset development and the solutions arrived at for these problems.
AB - Igbo, an African language with around 32 million speakers worldwide, is one of the many languages having few or none of the language processing resources needed for advanced language technology applications. In this article, we describe the approach taken to creating an initial set of resources for Igbo, including an electronic text corpus, a part-of-speech (POS) tagset, and a POS-tagged subcorpus. We discuss the approach taken in gathering texts, the preprocessing of these texts, and the development of the POS tagged corpus. We also discuss some of the problems encountered during corpus and tagset development and the solutions arrived at for these problems.
KW - African language
KW - Corpora
KW - Corpus annotation
KW - Human annotator
KW - Igbo
KW - Interannotation agreement
KW - Language technology
KW - Morphology
KW - Natural language processing (NLP)
KW - Normalization
KW - Part-of-speech (POS) tagging
KW - Segmentation
KW - Tagset
KW - Text processing
KW - Tokenization
U2 - 10.1145/3146387
DO - 10.1145/3146387
M3 - Journal article
AN - SCOPUS:85042519794
VL - 17
JO - ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
JF - ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
SN - 2375-4699
IS - 2
M1 - 10
ER -