Final published version, 1.7 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Working paper › Preprint
Research output: Working paper › Preprint
}
TY - UNPB
T1 - AfriMTE and AfriCOMET
T2 - Empowering COMET to Embrace Under-resourced African Languages
AU - Wang, Jiayi
AU - Adelani, David Ifeoluwa
AU - Agrawal, Sweta
AU - Rei, Ricardo
AU - Briakou, Eleftheria
AU - Carpuat, Marine
AU - Masiak, Marek
AU - He, Xuanli
AU - Bourhim, Sofia
AU - Bukula, Andiswa
AU - Mohamed, Muhidin
AU - Olatoye, Temitayo
AU - Mokayede, Hamam
AU - Mwase, Christine
AU - Kimotho, Wangui
AU - Yuehgoh, Foutse
AU - Aremu, Anuoluwapo
AU - Ojo, Jessica
AU - Muhammad, Shamsuddeen Hassan
AU - Osei, Salomey
AU - Omotayo, Abdul-Hakeem
AU - Chukwuneke, Chiamaka
AU - Ogayo, Perez
AU - Hourrane, Oumaima
AU - Anigri, Salma El
AU - Ndolela, Lolwethu
AU - Mangwana, Thabiso
AU - Mohamed, Shafie Abdi
AU - Hassan, Ayinde
AU - Awoyomi, Oluwabusayo Olufunke
AU - Alkhaled, Lama
AU - Al-Azzawi, Sana
AU - Etori, Naome A.
AU - Ochieng, Millicent
AU - Siro, Clemencia
AU - Njoroge, Samuel
AU - Muchiri, Eric
AU - Kimotho, Wangari
AU - Momo, Lyse Naomi Wamba
AU - Abolade, Daud
AU - Ajao, Simbiat
AU - Adewumi, Tosin
AU - Shode, Iyanuoluwa
AU - Macharm, Ricky
AU - Iro, Ruqayya Nasir
AU - Abdullahi, Saheed S.
AU - Moore, Stephen E.
AU - Opoku, Bernard
AU - Akinjobi, Zainab
AU - Afolabi, Abeeb
AU - Obiefuna, Nnaemeka
AU - Ogbu, Onyekachi Raphael
AU - Brian, Sam
AU - Otiende, Verrah Akinyi
AU - Mbonu, Chinedu Emmanuel
AU - Sari, Sakayo Toadoum
AU - Stenetorp, Pontus
PY - 2023/11/16
Y1 - 2023/11/16
N2 - Despite the progress we have recorded in scaling multilingual machine translation (MT) models and evaluation data to several under-resourced African languages, it is difficult to measure accurately the progress we have made on these languages because evaluation is often performed on n-gram matching metrics like BLEU that often have worse correlation with human judgments. Embedding-based metrics such as COMET correlate better; however, lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with a simplified MQM guideline for error-span annotation and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET, a COMET evaluation metric for African languages by leveraging DA training data from high-resource languages and African-centric multilingual encoder (AfroXLM-Roberta) to create the state-of-the-art evaluation metric for African languages MT with respect to Spearman-rank correlation with human judgments (+0.406).
AB - Despite the progress we have recorded in scaling multilingual machine translation (MT) models and evaluation data to several under-resourced African languages, it is difficult to measure accurately the progress we have made on these languages because evaluation is often performed on n-gram matching metrics like BLEU that often have worse correlation with human judgments. Embedding-based metrics such as COMET correlate better; however, lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with a simplified MQM guideline for error-span annotation and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET, a COMET evaluation metric for African languages by leveraging DA training data from high-resource languages and African-centric multilingual encoder (AfroXLM-Roberta) to create the state-of-the-art evaluation metric for African languages MT with respect to Spearman-rank correlation with human judgments (+0.406).
KW - cs.CL
M3 - Preprint
BT - AfriMTE and AfriCOMET
PB - Arxiv
ER -