Home > Research > Publications & Outputs > Participatory Research for Low-resourced Machin...

Electronic data

Keywords

View graph of relations

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published
  • Wilhelmina Nekoto
  • Vukosi Marivate
  • Tshinondiwa Matsila
  • Timi Fasubaa
  • Tajudeen Kolawole
  • Taiwo Fagbohungbe
  • Solomon Oluwole Akinola
  • Shamsuddeen Hassan Muhammad
  • Salomon Kabongo
  • Salomey Osei
  • Sackey Freshia
  • Rubungo Andre Niyongabo
  • Ricky Macharm
  • Perez Ogayo
  • Orevaoghene Ahia
  • Musie Meressa
  • Mofe Adeyemi
  • Masabata Mokgesi-Selinga
  • Lawrence Okegbemi
  • Laura Jane Martinus
  • Kolawole Tajudeen
  • Kevin Degila
  • Kelechi Ogueji
  • Kathleen Siminyu
  • Julia Kreutzer
  • Jason Webster
  • Jamiil Toure Ali
  • Jade Abbott
  • Iroro Orife
  • Idris Abdulkabir Dangana
  • Herman Kamper
  • Hady Elsahar
  • Goodness Duru
  • Ghollah Kioko
  • Espoir Murhabazi
  • Elan van Biljon
  • Daniel Whitenack
  • Christopher Onyefuluchi
  • Chris Emezue
  • Bonaventure Dossou
  • Blessing Sibanda
  • Blessing Itoro Bassey
  • Ayodele Olabiyi
  • Arshath Ramkilowan
  • Alp Öktem
  • Adewale Akinfaderin
  • Abdallah Bashir
Close
<mark>Journal publication date</mark>5/10/2020
<mark>Journal</mark>arXiv
Publication StatusPublished
<mark>Original language</mark>English

Abstract

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.

Bibliographic note

Findings of EMNLP 2020; updated benchmarks