Home > Research > Publications & Outputs > IgboNER 2.0

Electronic data

  • IgboNER_2_0_AfricaNLP2023

    Final published version, 417 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License


View graph of relations

IgboNER 2.0: Expanding Named Entity Recognition Datasets via Projection

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Publication date3/03/2023
<mark>Original language</mark>English
EventAfricaNLP 2023: African NLP in the Era of Large Language Models - Radisson Blu Hotel and Convention Center, Kigali, Rwanda
Duration: 5/05/20235/05/2023


WorkshopAfricaNLP 2023
Internet address


Since the inception of the state-of-the-art neural network models for natural language processing research, the major challenge faced by low-resource languages
is the lack or insufficiency of annotated training data. The named entity recognition (NER) task is no exception. The need for an efficient data creation and annotation process, especially for low-resource languages cannot be over-emphasized.
In this work, we leverage an existing NER tool for English in a cross-language
projection method that automatically creates a mapping dictionary of entities in
a source language and their translations in the target language using a parallel English-Igbo corpus. The resultant mapping dictionary, which was manually
checked and corrected by human annotators, was used to automatically generate
and format an NER training dataset from the Igbo monolingual corpus thereby
saving a lot of annotation time for the Igbo NER task. The generated dataset was
also included in the training process and our experiments show improved performance results from previous works.