A basic language resource kit implementation for the IgboNLP project

Home > Research > Publications & Outputs > A basic language resource kit implementation fo...

Computing and Communications

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Ikechukwu E. Onyenwe
Mark Hepple
Uchechukwu Chinedu
Ignatius Ezeani

More...

Article number	10
<mark>Journal publication date</mark>	31/01/2018
<mark>Journal</mark>	ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
Issue number	2
Volume	17
Number of pages	23
Publication Status	Published
<mark>Original language</mark>	English

Abstract

Igbo, an African language with around 32 million speakers worldwide, is one of the many languages having few or none of the language processing resources needed for advanced language technology applications. In this article, we describe the approach taken to creating an initial set of resources for Igbo, including an electronic text corpus, a part-of-speech (POS) tagset, and a POS-tagged subcorpus. We discuss the approach taken in gathering texts, the preprocessing of these texts, and the development of the POS tagged corpus. We also discuss some of the problems encountered during corpus and tagset development and the solutions arrived at for these problems.

Research

Associated organisational unit

Links

Text available via DOI:

Keywords

A basic language resource kit implementation for the IgboNLP project

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us