Other version, 408 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Working paper › Preprint
Research output: Working paper › Preprint
}
TY - UNPB
T1 - AfriQA
T2 - Cross-lingual Open-Retrieval Question Answering for African Languages
AU - Ogundepo, Odunayo
AU - Gwadabe, Tajuddeen R.
AU - Rivera, Clara E.
AU - Clark, Jonathan H.
AU - Ruder, Sebastian
AU - Adelani, David Ifeoluwa
AU - Ezeani, Ignatius
AU - Chukwuneke, Chiamaka
AU - Dossou, Bonaventure F. P.
AU - Abdou, Aziz DIOP
AU - Sikasote, Claytone
AU - Hacheme, Gilles
AU - Buzaaba, Happy
AU - Mabuya, Rooweither
AU - Osei, Salomey
AU - Emezue, Chris
AU - Kahira, Albert Njoroge
AU - Muhammad, Shamsuddeen H.
AU - Oladipo, Akintunde
AU - Owodunni, Abraham Toluwase
AU - Tonja, Atnafu Lambebo
AU - Shode, Iyanuoluwa
AU - Asai, Akari
AU - Ajayi, Tunde Oluwaseyi
AU - Siro, Clemencia
AU - Arthur, Steven
AU - Adeyemi, Mofetoluwa
AU - Ahia, Orevaoghene
AU - Anuoluwapo, Aremu
AU - Awosan, Oyinkansola
AU - Opoku, Bernard
AU - Ayodele, Awokoya
AU - Otiende, Verrah
AU - Mwase, Christine
AU - Sinkala, Boyd
AU - Rubungo, Andre Niyongabo
AU - Ajisafe, Daniel A.
AU - Onwuegbuzia, Emeka Felix
AU - Mbow, Habib
AU - Niyomutabazi, Emile
AU - Mukonde, Eunice
AU - Lawan, Falalu Ibrahim
AU - Ahmad, Ibrahim Said
AU - Alabi, Jesujoba O.
AU - Namukombo, Martin
AU - Chinedu, Mbonu
AU - Phiri, Mofya
AU - Putini, Neo
AU - Mngoma, Ndumiso
AU - Amuok, Priscilla A.
AU - Iro, Ruqayya Nasir
AU - Adhiambo, Sonia
PY - 2023/5/11
Y1 - 2023/5/11
N2 - African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.
AB - African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.
KW - cs.CL
KW - cs.AI
KW - cs.IR
M3 - Preprint
BT - AfriQA
PB - Arxiv
ER -