Home > Research > Publications & Outputs > MasakhaNER 2.0

Links

Text available via DOI:

View graph of relations

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition. / Adelani, David Ifeoluwa; Neubig, Graham; Ruder, Sebastian et al.
In: arXiv, Vol. abs/2210.12391, 15.11.2022.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Adelani, DI, Neubig, G, Ruder, S, Rijhwani, S, Beukman, M, Palen-Michel, C, Lignos, C, Alabi, JO, Muhammad, SH, Nabende, P, Dione, CMB, Bukula, A, Mabuya, R, Dossou, BFP, Sibanda, B, Buzaaba, H, Mukiibi, J, Kalipe, G, Mbaye, D, Taylor, A, Kabore, FO, Emezue, CC, Anuoluwapo, A, Ogayo, P, Gitau, C, Munkoh-Buabeng, E, Koagne, VM, Tapo, AA, Macucwa, T, Marivate, V, Mboning, E, Gwadabe, T, Adewumi, TP, Ahia, O, Nakatumba-Nabende, J, Mokono, NL, Ezeani, I, Chukwuneke, C, Adeyemi, M, Hacheme, G, Abdulmumin, I, Ogundepo, O, Yousuf, O, Ngoli, TM & Klakow, D 2022, 'MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition', arXiv, vol. abs/2210.12391. https://doi.org/10.48550/arXiv.2210.12391

APA

Adelani, D. I., Neubig, G., Ruder, S., Rijhwani, S., Beukman, M., Palen-Michel, C., Lignos, C., Alabi, J. O., Muhammad, S. H., Nabende, P., Dione, C. M. B., Bukula, A., Mabuya, R., Dossou, B. F. P., Sibanda, B., Buzaaba, H., Mukiibi, J., Kalipe, G., Mbaye, D., ... Klakow, D. (2022). MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition. arXiv, abs/2210.12391. https://doi.org/10.48550/arXiv.2210.12391

Vancouver

Adelani DI, Neubig G, Ruder S, Rijhwani S, Beukman M, Palen-Michel C et al. MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition. arXiv. 2022 Nov 15;abs/2210.12391. doi: 10.48550/arXiv.2210.12391

Author

Adelani, David Ifeoluwa ; Neubig, Graham ; Ruder, Sebastian et al. / MasakhaNER 2.0 : Africa-centric Transfer Learning for Named Entity Recognition. In: arXiv. 2022 ; Vol. abs/2210.12391.

Bibtex

@article{56b6ede61311481282252d00e4058cb6,
title = "MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition",
abstract = "African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.",
author = "Adelani, {David Ifeoluwa} and Graham Neubig and Sebastian Ruder and Shruti Rijhwani and Michael Beukman and Chester Palen-Michel and Constantine Lignos and Alabi, {Jesujoba O.} and Muhammad, {Shamsuddeen Hassan} and Peter Nabende and Dione, {Cheikh M. Bamba} and Andiswa Bukula and Rooweither Mabuya and Dossou, {Bonaventure F. P.} and Blessing Sibanda and Happy Buzaaba and Jonathan Mukiibi and Godson Kalipe and Derguene Mbaye and Amelia Taylor and Kabore, {Fatoumata Ouoba} and Emezue, {Chris Chinenye} and Aremu Anuoluwapo and Perez Ogayo and Catherine Gitau and Edwin Munkoh-Buabeng and Koagne, {Victoire Memdjokam} and Tapo, {Allahsera Auguste} and Tebogo Macucwa and Vukosi Marivate and Elvis Mboning and Tajuddeen Gwadabe and Adewumi, {Tosin P.} and Orevaoghene Ahia and Joyce Nakatumba-Nabende and Mokono, {Neo L.} and Ignatius Ezeani and Chiamaka Chukwuneke and Mofetoluwa Adeyemi and Gilles Hacheme and Idris Abdulmumin and Odunayo Ogundepo and Oreen Yousuf and Ngoli, {Tatiana Moteu} and Dietrich Klakow",
year = "2022",
month = nov,
day = "15",
doi = "10.48550/arXiv.2210.12391",
language = "English",
volume = "abs/2210.12391",
journal = "arXiv",
issn = "2331-8422",

}

RIS

TY - JOUR

T1 - MasakhaNER 2.0

T2 - Africa-centric Transfer Learning for Named Entity Recognition

AU - Adelani, David Ifeoluwa

AU - Neubig, Graham

AU - Ruder, Sebastian

AU - Rijhwani, Shruti

AU - Beukman, Michael

AU - Palen-Michel, Chester

AU - Lignos, Constantine

AU - Alabi, Jesujoba O.

AU - Muhammad, Shamsuddeen Hassan

AU - Nabende, Peter

AU - Dione, Cheikh M. Bamba

AU - Bukula, Andiswa

AU - Mabuya, Rooweither

AU - Dossou, Bonaventure F. P.

AU - Sibanda, Blessing

AU - Buzaaba, Happy

AU - Mukiibi, Jonathan

AU - Kalipe, Godson

AU - Mbaye, Derguene

AU - Taylor, Amelia

AU - Kabore, Fatoumata Ouoba

AU - Emezue, Chris Chinenye

AU - Anuoluwapo, Aremu

AU - Ogayo, Perez

AU - Gitau, Catherine

AU - Munkoh-Buabeng, Edwin

AU - Koagne, Victoire Memdjokam

AU - Tapo, Allahsera Auguste

AU - Macucwa, Tebogo

AU - Marivate, Vukosi

AU - Mboning, Elvis

AU - Gwadabe, Tajuddeen

AU - Adewumi, Tosin P.

AU - Ahia, Orevaoghene

AU - Nakatumba-Nabende, Joyce

AU - Mokono, Neo L.

AU - Ezeani, Ignatius

AU - Chukwuneke, Chiamaka

AU - Adeyemi, Mofetoluwa

AU - Hacheme, Gilles

AU - Abdulmumin, Idris

AU - Ogundepo, Odunayo

AU - Yousuf, Oreen

AU - Ngoli, Tatiana Moteu

AU - Klakow, Dietrich

PY - 2022/11/15

Y1 - 2022/11/15

N2 - African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.

AB - African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.

U2 - 10.48550/arXiv.2210.12391

DO - 10.48550/arXiv.2210.12391

M3 - Journal article

VL - abs/2210.12391

JO - arXiv

JF - arXiv

SN - 2331-8422

ER -