Final published version
Licence: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - IgboBERT Models
T2 - 13th Language Resources and Evaluation Conference
AU - Chukwuneke, CI
AU - Rayson, Paul
AU - Ezeani, Ignatius
AU - El-Haj, Mahmoud
PY - 2022/6/20
Y1 - 2022/6/20
N2 - This work presents a standard Igbo named entity recognition (IgboNER) dataset as well as the results from training and fine-tuning state-of-the-art transformer IgboNER models. We discuss the process of our dataset creation - data collection and annotation and quality checking. We also present experimental processes involved in building an IgboBERT language model from scratch as well as fine-tuning it along with other non-Igbo pre-trained models for the downstream IgboNER task. Our results show that, although the IgboNER task benefited hugely from fine-tuning large transformer model, fine-tuning a transformer model built from scratch with comparatively little Igbo text data seems to yield quite decent results for the IgboNER task. This work will contribute immensely to IgboNLP in particular as well as the wider African and low-resource NLP efforts.
AB - This work presents a standard Igbo named entity recognition (IgboNER) dataset as well as the results from training and fine-tuning state-of-the-art transformer IgboNER models. We discuss the process of our dataset creation - data collection and annotation and quality checking. We also present experimental processes involved in building an IgboBERT language model from scratch as well as fine-tuning it along with other non-Igbo pre-trained models for the downstream IgboNER task. Our results show that, although the IgboNER task benefited hugely from fine-tuning large transformer model, fine-tuning a transformer model built from scratch with comparatively little Igbo text data seems to yield quite decent results for the IgboNER task. This work will contribute immensely to IgboNLP in particular as well as the wider African and low-resource NLP efforts.
KW - Igbo
KW - named entity recognition
KW - BERT models
KW - under-resourced
KW - dataset
M3 - Conference contribution/Paper
SP - 5114
EP - 5122
BT - LREC 2022 Conference Proceedings
A2 - Calzolari, Nicoletta
PB - European Language Resources Association (ELRA)
CY - Paris
Y2 - 20 June 2022 through 25 June 2022
ER -