MasakhaNER 2.0 - Research Portal | Lancaster University

Home > Research > Publications & Outputs > MasakhaNER 2.0

Computing and Communications

Associated organisational units

Text available via DOI:

https://doi.org/10.48550/arXiv.2210.12391
Final published version
Available under license: CC BY: Creative Commons Attribution 4.0 International License

View graph of relations

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

David Ifeoluwa Adelani
Graham Neubig
Sebastian Ruder
Shruti Rijhwani
Michael Beukman
Chester Palen-Michel
Constantine Lignos
Jesujoba O. Alabi
Shamsuddeen Hassan Muhammad
Peter Nabende
Cheikh M. Bamba Dione
Andiswa Bukula
Rooweither Mabuya
Bonaventure F. P. Dossou
Blessing Sibanda
Happy Buzaaba
Jonathan Mukiibi
Godson Kalipe
Derguene Mbaye
Amelia Taylor
Fatoumata Ouoba Kabore
Chris Chinenye Emezue
Aremu Anuoluwapo
Perez Ogayo
Catherine Gitau
Edwin Munkoh-Buabeng
Victoire Memdjokam Koagne
Allahsera Auguste Tapo
Tebogo Macucwa
Vukosi Marivate
Elvis Mboning
Tajuddeen Gwadabe
Tosin P. Adewumi
Orevaoghene Ahia
Joyce Nakatumba-Nabende
Neo L. Mokono
Mofetoluwa Adeyemi
Gilles Hacheme
Idris Abdulmumin
Odunayo Ogundepo
Oreen Yousuf
Tatiana Moteu Ngoli
Dietrich Klakow

More...

<mark>Journal publication date</mark>	15/11/2022
<mark>Journal</mark>	arXiv
Volume	abs/2210.12391
Number of pages	22
Publication Status	Published
<mark>Original language</mark>	English

Abstract

African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.

Research

Associated organisational units

Links

Text available via DOI:

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us