AfriQA - Research Portal | Lancaster University

Computing and Communications

Electronic data

2305.06897v1
Other version, 408 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Keywords

cs.CL, cs.AI, cs.IR

View graph of relations

AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Research output: Working paper › Preprint

Published

Odunayo Ogundepo
Tajuddeen R. Gwadabe
Clara E. Rivera
Jonathan H. Clark
Sebastian Ruder
David Ifeoluwa Adelani
Bonaventure F. P. Dossou
Aziz DIOP Abdou
Claytone Sikasote
Gilles Hacheme
Happy Buzaaba
Rooweither Mabuya
Salomey Osei
Chris Emezue
Albert Njoroge Kahira
Shamsuddeen H. Muhammad
Akintunde Oladipo
Abraham Toluwase Owodunni
Atnafu Lambebo Tonja
Iyanuoluwa Shode
Akari Asai
Tunde Oluwaseyi Ajayi
Clemencia Siro
Steven Arthur
Mofetoluwa Adeyemi
Orevaoghene Ahia
Aremu Anuoluwapo
Oyinkansola Awosan
Bernard Opoku
Awokoya Ayodele
Verrah Otiende
Christine Mwase
Boyd Sinkala
Andre Niyongabo Rubungo
Daniel A. Ajisafe
Emeka Felix Onwuegbuzia
Habib Mbow
Emile Niyomutabazi
Eunice Mukonde
Falalu Ibrahim Lawan
Ibrahim Said Ahmad
Jesujoba O. Alabi
Martin Namukombo
Mbonu Chinedu
Mofya Phiri
Neo Putini
Ndumiso Mngoma
Priscilla A. Amuok
Ruqayya Nasir Iro
Sonia Adhiambo

More...

Publication date	11/05/2023
Publisher	Arxiv
<mark>Original language</mark>	English

Abstract

African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.

Research

Electronic data

Links

Keywords

AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us