A basic language resource kit implementation for the IgboNLP project

Computing and Communications

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

A basic language resource kit implementation for the IgboNLP project. / Onyenwe, Ikechukwu E.; Hepple, Mark; Chinedu, Uchechukwu et al.
In: ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Vol. 17, No. 2, 10, 31.01.2018.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Onyenwe, IE, Hepple, M, Chinedu, U & Ezeani, I 2018, 'A basic language resource kit implementation for the IgboNLP project', ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 17, no. 2, 10. https://doi.org/10.1145/3146387

APA

Onyenwe, I. E., Hepple, M., Chinedu, U., & Ezeani, I. (2018). A basic language resource kit implementation for the IgboNLP project. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 17(2), Article 10. https://doi.org/10.1145/3146387

Vancouver

Onyenwe IE, Hepple M, Chinedu U, Ezeani I. A basic language resource kit implementation for the IgboNLP project. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). 2018 Jan 31;17(2):10. doi: 10.1145/3146387

Author

Onyenwe, Ikechukwu E. ; Hepple, Mark ; Chinedu, Uchechukwu et al. / A basic language resource kit implementation for the IgboNLP project. In: ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). 2018 ; Vol. 17, No. 2.

Bibtex

@article{4e540e148c1243fc99440c32951e853f,

title = "A basic language resource kit implementation for the IgboNLP project",

abstract = "Igbo, an African language with around 32 million speakers worldwide, is one of the many languages having few or none of the language processing resources needed for advanced language technology applications. In this article, we describe the approach taken to creating an initial set of resources for Igbo, including an electronic text corpus, a part-of-speech (POS) tagset, and a POS-tagged subcorpus. We discuss the approach taken in gathering texts, the preprocessing of these texts, and the development of the POS tagged corpus. We also discuss some of the problems encountered during corpus and tagset development and the solutions arrived at for these problems.",

keywords = "African language, Corpora, Corpus annotation, Human annotator, Igbo, Interannotation agreement, Language technology, Morphology, Natural language processing (NLP), Normalization, Part-of-speech (POS) tagging, Segmentation, Tagset, Text processing, Tokenization",

author = "Onyenwe, {Ikechukwu E.} and Mark Hepple and Uchechukwu Chinedu and Ignatius Ezeani",

year = "2018",

month = jan,

day = "31",

doi = "10.1145/3146387",

language = "English",

volume = "17",

journal = "ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)",

issn = "2375-4699",

publisher = "Association for Computing Machinery (ACM)",

number = "2",

}

RIS

TY - JOUR

T1 - A basic language resource kit implementation for the IgboNLP project

AU - Onyenwe, Ikechukwu E.

AU - Hepple, Mark

AU - Chinedu, Uchechukwu

AU - Ezeani, Ignatius

PY - 2018/1/31

Y1 - 2018/1/31

N2 - Igbo, an African language with around 32 million speakers worldwide, is one of the many languages having few or none of the language processing resources needed for advanced language technology applications. In this article, we describe the approach taken to creating an initial set of resources for Igbo, including an electronic text corpus, a part-of-speech (POS) tagset, and a POS-tagged subcorpus. We discuss the approach taken in gathering texts, the preprocessing of these texts, and the development of the POS tagged corpus. We also discuss some of the problems encountered during corpus and tagset development and the solutions arrived at for these problems.

AB - Igbo, an African language with around 32 million speakers worldwide, is one of the many languages having few or none of the language processing resources needed for advanced language technology applications. In this article, we describe the approach taken to creating an initial set of resources for Igbo, including an electronic text corpus, a part-of-speech (POS) tagset, and a POS-tagged subcorpus. We discuss the approach taken in gathering texts, the preprocessing of these texts, and the development of the POS tagged corpus. We also discuss some of the problems encountered during corpus and tagset development and the solutions arrived at for these problems.

KW - African language

KW - Corpora

KW - Corpus annotation

KW - Human annotator

KW - Igbo

KW - Interannotation agreement

KW - Language technology

KW - Morphology

KW - Natural language processing (NLP)

KW - Normalization

KW - Part-of-speech (POS) tagging

KW - Segmentation

KW - Tagset

KW - Text processing

KW - Tokenization

U2 - 10.1145/3146387

DO - 10.1145/3146387

M3 - Journal article

AN - SCOPUS:85042519794

VL - 17

JO - ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)

JF - ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)

SN - 2375-4699

IS - 2

M1 - 10

ER -

Research

Associated organisational unit

Links

Text available via DOI:

Keywords