Home > Research > Publications & Outputs > IgboBERT Models

Links

View graph of relations

IgboBERT Models: Building and Training Transformer Models for the Igbo Language

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

IgboBERT Models: Building and Training Transformer Models for the Igbo Language. / Chukwuneke, CI; Rayson, Paul; Ezeani, Ignatius et al.
LREC 2022 Conference Proceedings. ed. / Nicoletta Calzolari. Paris: European Language Resources Association (ELRA), 2022. p. 5114–5122.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Chukwuneke, CI, Rayson, P, Ezeani, I & El-Haj, M 2022, IgboBERT Models: Building and Training Transformer Models for the Igbo Language. in N Calzolari (ed.), LREC 2022 Conference Proceedings. European Language Resources Association (ELRA), Paris, pp. 5114–5122, 13th Language Resources and Evaluation Conference, Marseille, France, 20/06/22. <http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.547.pdf>

APA

Chukwuneke, CI., Rayson, P., Ezeani, I., & El-Haj, M. (2022). IgboBERT Models: Building and Training Transformer Models for the Igbo Language. In N. Calzolari (Ed.), LREC 2022 Conference Proceedings (pp. 5114–5122). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.547.pdf

Vancouver

Chukwuneke CI, Rayson P, Ezeani I, El-Haj M. IgboBERT Models: Building and Training Transformer Models for the Igbo Language. In Calzolari N, editor, LREC 2022 Conference Proceedings. Paris: European Language Resources Association (ELRA). 2022. p. 5114–5122

Author

Chukwuneke, CI ; Rayson, Paul ; Ezeani, Ignatius et al. / IgboBERT Models : Building and Training Transformer Models for the Igbo Language. LREC 2022 Conference Proceedings. editor / Nicoletta Calzolari. Paris : European Language Resources Association (ELRA), 2022. pp. 5114–5122

Bibtex

@inproceedings{b45820cf4d75444e93dcbe09febe4f06,
title = "IgboBERT Models: Building and Training Transformer Models for the Igbo Language",
abstract = "This work presents a standard Igbo named entity recognition (IgboNER) dataset as well as the results from training and fine-tuning state-of-the-art transformer IgboNER models. We discuss the process of our dataset creation - data collection and annotation and quality checking. We also present experimental processes involved in building an IgboBERT language model from scratch as well as fine-tuning it along with other non-Igbo pre-trained models for the downstream IgboNER task. Our results show that, although the IgboNER task benefited hugely from fine-tuning large transformer model, fine-tuning a transformer model built from scratch with comparatively little Igbo text data seems to yield quite decent results for the IgboNER task. This work will contribute immensely to IgboNLP in particular as well as the wider African and low-resource NLP efforts.",
keywords = "Igbo, named entity recognition, BERT models, under-resourced, dataset",
author = "CI Chukwuneke and Paul Rayson and Ignatius Ezeani and Mahmoud El-Haj",
year = "2022",
month = jun,
day = "20",
language = "English",
pages = "5114–5122",
editor = "Calzolari, {Nicoletta }",
booktitle = "LREC 2022 Conference Proceedings",
publisher = "European Language Resources Association (ELRA)",
note = "13th Language Resources and Evaluation Conference, LREC 2022 ; Conference date: 20-06-2022 Through 25-06-2022",
url = "https://lrec2022.lrec-conf.org/en/",

}

RIS

TY - GEN

T1 - IgboBERT Models

T2 - 13th Language Resources and Evaluation Conference

AU - Chukwuneke, CI

AU - Rayson, Paul

AU - Ezeani, Ignatius

AU - El-Haj, Mahmoud

PY - 2022/6/20

Y1 - 2022/6/20

N2 - This work presents a standard Igbo named entity recognition (IgboNER) dataset as well as the results from training and fine-tuning state-of-the-art transformer IgboNER models. We discuss the process of our dataset creation - data collection and annotation and quality checking. We also present experimental processes involved in building an IgboBERT language model from scratch as well as fine-tuning it along with other non-Igbo pre-trained models for the downstream IgboNER task. Our results show that, although the IgboNER task benefited hugely from fine-tuning large transformer model, fine-tuning a transformer model built from scratch with comparatively little Igbo text data seems to yield quite decent results for the IgboNER task. This work will contribute immensely to IgboNLP in particular as well as the wider African and low-resource NLP efforts.

AB - This work presents a standard Igbo named entity recognition (IgboNER) dataset as well as the results from training and fine-tuning state-of-the-art transformer IgboNER models. We discuss the process of our dataset creation - data collection and annotation and quality checking. We also present experimental processes involved in building an IgboBERT language model from scratch as well as fine-tuning it along with other non-Igbo pre-trained models for the downstream IgboNER task. Our results show that, although the IgboNER task benefited hugely from fine-tuning large transformer model, fine-tuning a transformer model built from scratch with comparatively little Igbo text data seems to yield quite decent results for the IgboNER task. This work will contribute immensely to IgboNLP in particular as well as the wider African and low-resource NLP efforts.

KW - Igbo

KW - named entity recognition

KW - BERT models

KW - under-resourced

KW - dataset

M3 - Conference contribution/Paper

SP - 5114

EP - 5122

BT - LREC 2022 Conference Proceedings

A2 - Calzolari, Nicoletta

PB - European Language Resources Association (ELRA)

CY - Paris

Y2 - 20 June 2022 through 25 June 2022

ER -