Participatory Research for Low-resourced Machine Translation - Research Portal

Home > Research > Publications & Outputs > Participatory Research for Low-resourced Machin...

Computing and Communications

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Wilhelmina Nekoto
Vukosi Marivate
Tshinondiwa Matsila
Timi Fasubaa
Tajudeen Kolawole
Taiwo Fagbohungbe
Solomon Oluwole Akinola
Shamsuddeen Hassan Muhammad
Salomon Kabongo
Salomey Osei
Sackey Freshia
Rubungo Andre Niyongabo
Ricky Macharm
Perez Ogayo
Orevaoghene Ahia
Musie Meressa
Mofe Adeyemi
Masabata Mokgesi-Selinga
Lawrence Okegbemi
Laura Jane Martinus
Kolawole Tajudeen
Kevin Degila
Kelechi Ogueji
Kathleen Siminyu
Julia Kreutzer
Jason Webster
Jamiil Toure Ali
Jade Abbott
Iroro Orife
Idris Abdulkabir Dangana
Herman Kamper
Hady Elsahar
Goodness Duru
Ghollah Kioko
Espoir Murhabazi
Elan van Biljon
Daniel Whitenack
Christopher Onyefuluchi
Chris Emezue
Bonaventure Dossou
Blessing Sibanda
Blessing Itoro Bassey
Ayodele Olabiyi
Arshath Ramkilowan
Alp Öktem
Adewale Akinfaderin
Abdallah Bashir

More...

<mark>Journal publication date</mark>	5/10/2020
<mark>Journal</mark>	arXiv
Publication Status	Published
<mark>Original language</mark>	English

Abstract

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.

Bibliographic note

Findings of EMNLP 2020; updated benchmarks

Research

Associated organisational unit

Electronic data

Keywords

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Abstract

Bibliographic note

Quick Links

Connect With Us

Faculties & Depts

Contact Us