Participatory Research for Low-resourced Machine Translation - Research Portal

Computing and Communications

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages. / Nekoto, Wilhelmina; Marivate, Vukosi; Matsila, Tshinondiwa et al.
In: arXiv, 05.10.2020.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Nekoto, W, Marivate, V, Matsila, T, Fasubaa, T, Kolawole, T, Fagbohungbe, T, Akinola, SO, Muhammad, SH, Kabongo, S, Osei, S, Freshia, S, Niyongabo, RA, Macharm, R, Ogayo, P, Ahia, O, Meressa, M, Adeyemi, M, Mokgesi-Selinga, M, Okegbemi, L, Martinus, LJ, Tajudeen, K, Degila, K, Ogueji, K, Siminyu, K, Kreutzer, J, Webster, J, Ali, JT, Abbott, J, Orife, I, Ezeani, I, Dangana, IA, Kamper, H, Elsahar, H, Duru, G, Kioko, G, Murhabazi, E, Biljon, EV, Whitenack, D, Onyefuluchi, C, Emezue, C, Dossou, B, Sibanda, B, Bassey, BI, Olabiyi, A, Ramkilowan, A, Öktem, A, Akinfaderin, A & Bashir, A 2020, 'Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages', arXiv.

APA

Nekoto, W., Marivate, V., Matsila, T., Fasubaa, T., Kolawole, T., Fagbohungbe, T., Akinola, S. O., Muhammad, S. H., Kabongo, S., Osei, S., Freshia, S., Niyongabo, R. A., Macharm, R., Ogayo, P., Ahia, O., Meressa, M., Adeyemi, M., Mokgesi-Selinga, M., Okegbemi, L., ... Bashir, A. (2020). Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages. arXiv.

Vancouver

Nekoto W, Marivate V, Matsila T, Fasubaa T, Kolawole T, Fagbohungbe T et al. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages. arXiv. 2020 Oct 5.

Author

Nekoto, Wilhelmina ; Marivate, Vukosi ; Matsila, Tshinondiwa et al. / Participatory Research for Low-resourced Machine Translation : A Case Study in African Languages. In: arXiv. 2020.

Bibtex

@article{ae772c31b49d4bd8851bf003b3647b6b,

title = "Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages",

abstract = " Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. {"}Low-resourced{"}-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt. ",

keywords = "cs.CL, cs.AI, cs.LG",

author = "Wilhelmina Nekoto and Vukosi Marivate and Tshinondiwa Matsila and Timi Fasubaa and Tajudeen Kolawole and Taiwo Fagbohungbe and Akinola, {Solomon Oluwole} and Muhammad, {Shamsuddeen Hassan} and Salomon Kabongo and Salomey Osei and Sackey Freshia and Niyongabo, {Rubungo Andre} and Ricky Macharm and Perez Ogayo and Orevaoghene Ahia and Musie Meressa and Mofe Adeyemi and Masabata Mokgesi-Selinga and Lawrence Okegbemi and Martinus, {Laura Jane} and Kolawole Tajudeen and Kevin Degila and Kelechi Ogueji and Kathleen Siminyu and Julia Kreutzer and Jason Webster and Ali, {Jamiil Toure} and Jade Abbott and Iroro Orife and Ignatius Ezeani and Dangana, {Idris Abdulkabir} and Herman Kamper and Hady Elsahar and Goodness Duru and Ghollah Kioko and Espoir Murhabazi and Biljon, {Elan van} and Daniel Whitenack and Christopher Onyefuluchi and Chris Emezue and Bonaventure Dossou and Blessing Sibanda and Bassey, {Blessing Itoro} and Ayodele Olabiyi and Arshath Ramkilowan and Alp {\"O}ktem and Adewale Akinfaderin and Abdallah Bashir",

note = "Findings of EMNLP 2020; updated benchmarks",

year = "2020",

month = oct,

day = "5",

language = "English",

journal = "arXiv",

issn = "2331-8422",

}

RIS

TY - JOUR

T1 - Participatory Research for Low-resourced Machine Translation

T2 - A Case Study in African Languages

AU - Nekoto, Wilhelmina

AU - Marivate, Vukosi

AU - Matsila, Tshinondiwa

AU - Fasubaa, Timi

AU - Kolawole, Tajudeen

AU - Fagbohungbe, Taiwo

AU - Akinola, Solomon Oluwole

AU - Muhammad, Shamsuddeen Hassan

AU - Kabongo, Salomon

AU - Osei, Salomey

AU - Freshia, Sackey

AU - Niyongabo, Rubungo Andre

AU - Macharm, Ricky

AU - Ogayo, Perez

AU - Ahia, Orevaoghene

AU - Meressa, Musie

AU - Adeyemi, Mofe

AU - Mokgesi-Selinga, Masabata

AU - Okegbemi, Lawrence

AU - Martinus, Laura Jane

AU - Tajudeen, Kolawole

AU - Degila, Kevin

AU - Ogueji, Kelechi

AU - Siminyu, Kathleen

AU - Kreutzer, Julia

AU - Webster, Jason

AU - Ali, Jamiil Toure

AU - Abbott, Jade

AU - Orife, Iroro

AU - Ezeani, Ignatius

AU - Dangana, Idris Abdulkabir

AU - Kamper, Herman

AU - Elsahar, Hady

AU - Duru, Goodness

AU - Kioko, Ghollah

AU - Murhabazi, Espoir

AU - Biljon, Elan van

AU - Whitenack, Daniel

AU - Onyefuluchi, Christopher

AU - Emezue, Chris

AU - Dossou, Bonaventure

AU - Sibanda, Blessing

AU - Bassey, Blessing Itoro

AU - Olabiyi, Ayodele

AU - Ramkilowan, Arshath

AU - Öktem, Alp

AU - Akinfaderin, Adewale

AU - Bashir, Abdallah

N1 - Findings of EMNLP 2020; updated benchmarks

PY - 2020/10/5

Y1 - 2020/10/5

N2 - Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.

AB - Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.

KW - cs.CL

KW - cs.AI

KW - cs.LG

M3 - Journal article

JO - arXiv

JF - arXiv

SN - 2331-8422

ER -

Research

Associated organisational unit

Electronic data

Keywords

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us