Home > Research > Publications & Outputs > AfriMTE and AfriCOMET

Electronic data

  • 2311.09828v1

    Final published version, 1.7 MB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Keywords

View graph of relations

AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages

Research output: Working paperPreprint

Published

Standard

AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages. / Wang, Jiayi; Adelani, David Ifeoluwa; Agrawal, Sweta et al.
Arxiv, 2023.

Research output: Working paperPreprint

Harvard

Wang, J, Adelani, DI, Agrawal, S, Rei, R, Briakou, E, Carpuat, M, Masiak, M, He, X, Bourhim, S, Bukula, A, Mohamed, M, Olatoye, T, Mokayede, H, Mwase, C, Kimotho, W, Yuehgoh, F, Aremu, A, Ojo, J, Muhammad, SH, Osei, S, Omotayo, A-H, Chukwuneke, C, Ogayo, P, Hourrane, O, Anigri, SE, Ndolela, L, Mangwana, T, Mohamed, SA, Hassan, A, Awoyomi, OO, Alkhaled, L, Al-Azzawi, S, Etori, NA, Ochieng, M, Siro, C, Njoroge, S, Muchiri, E, Kimotho, W, Momo, LNW, Abolade, D, Ajao, S, Adewumi, T, Shode, I, Macharm, R, Iro, RN, Abdullahi, SS, Moore, SE, Opoku, B, Akinjobi, Z, Afolabi, A, Obiefuna, N, Ogbu, OR, Brian, S, Otiende, VA, Mbonu, CE, Sari, ST & Stenetorp, P 2023 'AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages' Arxiv. <https://arxiv.org/abs/2311.09828v1>

APA

Wang, J., Adelani, D. I., Agrawal, S., Rei, R., Briakou, E., Carpuat, M., Masiak, M., He, X., Bourhim, S., Bukula, A., Mohamed, M., Olatoye, T., Mokayede, H., Mwase, C., Kimotho, W., Yuehgoh, F., Aremu, A., Ojo, J., Muhammad, S. H., ... Stenetorp, P. (2023). AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages. Arxiv. https://arxiv.org/abs/2311.09828v1

Vancouver

Wang J, Adelani DI, Agrawal S, Rei R, Briakou E, Carpuat M et al. AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages. Arxiv. 2023 Nov 16.

Author

Wang, Jiayi ; Adelani, David Ifeoluwa ; Agrawal, Sweta et al. / AfriMTE and AfriCOMET : Empowering COMET to Embrace Under-resourced African Languages. Arxiv, 2023.

Bibtex

@techreport{a9635ce1dfb34fb8a24c623949f030fc,
title = "AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages",
abstract = "Despite the progress we have recorded in scaling multilingual machine translation (MT) models and evaluation data to several under-resourced African languages, it is difficult to measure accurately the progress we have made on these languages because evaluation is often performed on n-gram matching metrics like BLEU that often have worse correlation with human judgments. Embedding-based metrics such as COMET correlate better; however, lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with a simplified MQM guideline for error-span annotation and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET, a COMET evaluation metric for African languages by leveraging DA training data from high-resource languages and African-centric multilingual encoder (AfroXLM-Roberta) to create the state-of-the-art evaluation metric for African languages MT with respect to Spearman-rank correlation with human judgments (+0.406).",
keywords = "cs.CL",
author = "Jiayi Wang and Adelani, {David Ifeoluwa} and Sweta Agrawal and Ricardo Rei and Eleftheria Briakou and Marine Carpuat and Marek Masiak and Xuanli He and Sofia Bourhim and Andiswa Bukula and Muhidin Mohamed and Temitayo Olatoye and Hamam Mokayede and Christine Mwase and Wangui Kimotho and Foutse Yuehgoh and Anuoluwapo Aremu and Jessica Ojo and Muhammad, {Shamsuddeen Hassan} and Salomey Osei and Abdul-Hakeem Omotayo and Chiamaka Chukwuneke and Perez Ogayo and Oumaima Hourrane and Anigri, {Salma El} and Lolwethu Ndolela and Thabiso Mangwana and Mohamed, {Shafie Abdi} and Ayinde Hassan and Awoyomi, {Oluwabusayo Olufunke} and Lama Alkhaled and Sana Al-Azzawi and Etori, {Naome A.} and Millicent Ochieng and Clemencia Siro and Samuel Njoroge and Eric Muchiri and Wangari Kimotho and Momo, {Lyse Naomi Wamba} and Daud Abolade and Simbiat Ajao and Tosin Adewumi and Iyanuoluwa Shode and Ricky Macharm and Iro, {Ruqayya Nasir} and Abdullahi, {Saheed S.} and Moore, {Stephen E.} and Bernard Opoku and Zainab Akinjobi and Abeeb Afolabi and Nnaemeka Obiefuna and Ogbu, {Onyekachi Raphael} and Sam Brian and Otiende, {Verrah Akinyi} and Mbonu, {Chinedu Emmanuel} and Sari, {Sakayo Toadoum} and Pontus Stenetorp",
year = "2023",
month = nov,
day = "16",
language = "English",
publisher = "Arxiv",
type = "WorkingPaper",
institution = "Arxiv",

}

RIS

TY - UNPB

T1 - AfriMTE and AfriCOMET

T2 - Empowering COMET to Embrace Under-resourced African Languages

AU - Wang, Jiayi

AU - Adelani, David Ifeoluwa

AU - Agrawal, Sweta

AU - Rei, Ricardo

AU - Briakou, Eleftheria

AU - Carpuat, Marine

AU - Masiak, Marek

AU - He, Xuanli

AU - Bourhim, Sofia

AU - Bukula, Andiswa

AU - Mohamed, Muhidin

AU - Olatoye, Temitayo

AU - Mokayede, Hamam

AU - Mwase, Christine

AU - Kimotho, Wangui

AU - Yuehgoh, Foutse

AU - Aremu, Anuoluwapo

AU - Ojo, Jessica

AU - Muhammad, Shamsuddeen Hassan

AU - Osei, Salomey

AU - Omotayo, Abdul-Hakeem

AU - Chukwuneke, Chiamaka

AU - Ogayo, Perez

AU - Hourrane, Oumaima

AU - Anigri, Salma El

AU - Ndolela, Lolwethu

AU - Mangwana, Thabiso

AU - Mohamed, Shafie Abdi

AU - Hassan, Ayinde

AU - Awoyomi, Oluwabusayo Olufunke

AU - Alkhaled, Lama

AU - Al-Azzawi, Sana

AU - Etori, Naome A.

AU - Ochieng, Millicent

AU - Siro, Clemencia

AU - Njoroge, Samuel

AU - Muchiri, Eric

AU - Kimotho, Wangari

AU - Momo, Lyse Naomi Wamba

AU - Abolade, Daud

AU - Ajao, Simbiat

AU - Adewumi, Tosin

AU - Shode, Iyanuoluwa

AU - Macharm, Ricky

AU - Iro, Ruqayya Nasir

AU - Abdullahi, Saheed S.

AU - Moore, Stephen E.

AU - Opoku, Bernard

AU - Akinjobi, Zainab

AU - Afolabi, Abeeb

AU - Obiefuna, Nnaemeka

AU - Ogbu, Onyekachi Raphael

AU - Brian, Sam

AU - Otiende, Verrah Akinyi

AU - Mbonu, Chinedu Emmanuel

AU - Sari, Sakayo Toadoum

AU - Stenetorp, Pontus

PY - 2023/11/16

Y1 - 2023/11/16

N2 - Despite the progress we have recorded in scaling multilingual machine translation (MT) models and evaluation data to several under-resourced African languages, it is difficult to measure accurately the progress we have made on these languages because evaluation is often performed on n-gram matching metrics like BLEU that often have worse correlation with human judgments. Embedding-based metrics such as COMET correlate better; however, lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with a simplified MQM guideline for error-span annotation and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET, a COMET evaluation metric for African languages by leveraging DA training data from high-resource languages and African-centric multilingual encoder (AfroXLM-Roberta) to create the state-of-the-art evaluation metric for African languages MT with respect to Spearman-rank correlation with human judgments (+0.406).

AB - Despite the progress we have recorded in scaling multilingual machine translation (MT) models and evaluation data to several under-resourced African languages, it is difficult to measure accurately the progress we have made on these languages because evaluation is often performed on n-gram matching metrics like BLEU that often have worse correlation with human judgments. Embedding-based metrics such as COMET correlate better; however, lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with a simplified MQM guideline for error-span annotation and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET, a COMET evaluation metric for African languages by leveraging DA training data from high-resource languages and African-centric multilingual encoder (AfroXLM-Roberta) to create the state-of-the-art evaluation metric for African languages MT with respect to Spearman-rank correlation with human judgments (+0.406).

KW - cs.CL

M3 - Preprint

BT - AfriMTE and AfriCOMET

PB - Arxiv

ER -