Home > Research > Publications & Outputs > IgboNER 2.0

Electronic data

  • IgboNER_2_0_AfricaNLP2023

    Final published version, 417 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

View graph of relations

IgboNER 2.0: Expanding Named Entity Recognition Datasets via Projection

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Forthcoming

Standard

IgboNER 2.0: Expanding Named Entity Recognition Datasets via Projection. / Chukwuneke, CI; Rayson, Paul; Ezeani, Ignatius et al.
2023. Paper presented at AfricaNLP 2023, Kigali, Rwanda.

Research output: Contribution to conference - Without ISBN/ISSN Conference paperpeer-review

Harvard

Chukwuneke, CI, Rayson, P, Ezeani, I, El-Haj, M, Asogwa, D, Okpalla, C & Mbonu, C 2023, 'IgboNER 2.0: Expanding Named Entity Recognition Datasets via Projection', Paper presented at AfricaNLP 2023, Kigali, Rwanda, 5/05/23 - 5/05/23. <https://openreview.net/pdf?id=tHUS9-vmUfC>

APA

Vancouver

Chukwuneke CI, Rayson P, Ezeani I, El-Haj M, Asogwa D, Okpalla C et al.. IgboNER 2.0: Expanding Named Entity Recognition Datasets via Projection. 2023. Paper presented at AfricaNLP 2023, Kigali, Rwanda.

Author

Bibtex

@conference{a4c3c7bfb32b49cb86e3f63d73081a6b,
title = "IgboNER 2.0: Expanding Named Entity Recognition Datasets via Projection",
abstract = "Since the inception of the state-of-the-art neural network models for natural language processing research, the major challenge faced by low-resource languagesis the lack or insufficiency of annotated training data. The named entity recognition (NER) task is no exception. The need for an efficient data creation and annotation process, especially for low-resource languages cannot be over-emphasized.In this work, we leverage an existing NER tool for English in a cross-languageprojection method that automatically creates a mapping dictionary of entities ina source language and their translations in the target language using a parallel English-Igbo corpus. The resultant mapping dictionary, which was manuallychecked and corrected by human annotators, was used to automatically generateand format an NER training dataset from the Igbo monolingual corpus therebysaving a lot of annotation time for the Igbo NER task. The generated dataset wasalso included in the training process and our experiments show improved performance results from previous works.",
author = "CI Chukwuneke and Paul Rayson and Ignatius Ezeani and Mahmoud El-Haj and Doris Asogwa and Chidimma Okpalla and Chinedu Mbonu",
year = "2023",
month = mar,
day = "3",
language = "English",
note = "AfricaNLP 2023 : African NLP in the Era of Large Language Models ; Conference date: 05-05-2023 Through 05-05-2023",
url = "https://sites.google.com/view/africanlp2023/home?pli=1",

}

RIS

TY - CONF

T1 - IgboNER 2.0

T2 - AfricaNLP 2023

AU - Chukwuneke, CI

AU - Rayson, Paul

AU - Ezeani, Ignatius

AU - El-Haj, Mahmoud

AU - Asogwa, Doris

AU - Okpalla, Chidimma

AU - Mbonu, Chinedu

PY - 2023/3/3

Y1 - 2023/3/3

N2 - Since the inception of the state-of-the-art neural network models for natural language processing research, the major challenge faced by low-resource languagesis the lack or insufficiency of annotated training data. The named entity recognition (NER) task is no exception. The need for an efficient data creation and annotation process, especially for low-resource languages cannot be over-emphasized.In this work, we leverage an existing NER tool for English in a cross-languageprojection method that automatically creates a mapping dictionary of entities ina source language and their translations in the target language using a parallel English-Igbo corpus. The resultant mapping dictionary, which was manuallychecked and corrected by human annotators, was used to automatically generateand format an NER training dataset from the Igbo monolingual corpus therebysaving a lot of annotation time for the Igbo NER task. The generated dataset wasalso included in the training process and our experiments show improved performance results from previous works.

AB - Since the inception of the state-of-the-art neural network models for natural language processing research, the major challenge faced by low-resource languagesis the lack or insufficiency of annotated training data. The named entity recognition (NER) task is no exception. The need for an efficient data creation and annotation process, especially for low-resource languages cannot be over-emphasized.In this work, we leverage an existing NER tool for English in a cross-languageprojection method that automatically creates a mapping dictionary of entities ina source language and their translations in the target language using a parallel English-Igbo corpus. The resultant mapping dictionary, which was manuallychecked and corrected by human annotators, was used to automatically generateand format an NER training dataset from the Igbo monolingual corpus therebysaving a lot of annotation time for the Igbo NER task. The generated dataset wasalso included in the training process and our experiments show improved performance results from previous works.

M3 - Conference paper

Y2 - 5 May 2023 through 5 May 2023

ER -