Accepted author manuscript, 692 KB, PDF document
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Participatory Research for Low-resourced Machine Translation
T2 - A Case Study in African Languages
AU - Nekoto, Wilhelmina
AU - Marivate, Vukosi
AU - Matsila, Tshinondiwa
AU - Fasubaa, Timi
AU - Kolawole, Tajudeen
AU - Fagbohungbe, Taiwo
AU - Akinola, Solomon Oluwole
AU - Muhammad, Shamsuddeen Hassan
AU - Kabongo, Salomon
AU - Osei, Salomey
AU - Freshia, Sackey
AU - Niyongabo, Rubungo Andre
AU - Macharm, Ricky
AU - Ogayo, Perez
AU - Ahia, Orevaoghene
AU - Meressa, Musie
AU - Adeyemi, Mofe
AU - Mokgesi-Selinga, Masabata
AU - Okegbemi, Lawrence
AU - Martinus, Laura Jane
AU - Tajudeen, Kolawole
AU - Degila, Kevin
AU - Ogueji, Kelechi
AU - Siminyu, Kathleen
AU - Kreutzer, Julia
AU - Webster, Jason
AU - Ali, Jamiil Toure
AU - Abbott, Jade
AU - Orife, Iroro
AU - Ezeani, Ignatius
AU - Dangana, Idris Abdulkabir
AU - Kamper, Herman
AU - Elsahar, Hady
AU - Duru, Goodness
AU - Kioko, Ghollah
AU - Murhabazi, Espoir
AU - Biljon, Elan van
AU - Whitenack, Daniel
AU - Onyefuluchi, Christopher
AU - Emezue, Chris
AU - Dossou, Bonaventure
AU - Sibanda, Blessing
AU - Bassey, Blessing Itoro
AU - Olabiyi, Ayodele
AU - Ramkilowan, Arshath
AU - Öktem, Alp
AU - Akinfaderin, Adewale
AU - Bashir, Abdallah
N1 - Findings of EMNLP 2020; updated benchmarks
PY - 2020/10/5
Y1 - 2020/10/5
N2 - Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.
AB - Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.
KW - cs.CL
KW - cs.AI
KW - cs.LG
M3 - Journal article
JO - arXiv
JF - arXiv
SN - 2331-8422
ER -