Accepted author manuscript, 78.3 KB, PDF document
Research output: Working paper › Preprint
Research output: Working paper › Preprint
}
TY - UNPB
T1 - The African Stopwords project
T2 - curating stopwords for African languages
AU - Emezue, Chris
AU - Nigatu, Hellina
AU - Chukwuneke, Chiamaka
AU - Thinwa, Cynthia
AU - Zhou, Helper
AU - Muhammad, Shamsuddeen
AU - Louis, Lerato
AU - Abdulmumin, Idris
AU - Oyerinde, Samuel
AU - Ajibade, Benjamin
AU - Samuel, Olanrewaju
AU - Joshua, Oviawe
AU - Onwuegbuzia, Emeka
AU - Emezue, Handel
AU - Ige, Ifeoluwatayo A.
AU - Tonja, Atnafu Lambebo
AU - Dossou, Bonaventure F. P.
AU - Etori, Naome A.
AU - Emmanuel, Mbonu Chinedu
AU - Yousuf, Oreen
AU - Aina, Kaosarat
AU - David, Davis
N1 - Accepted at the AfricaNLP workshop at ICLR2022
PY - 2023/3/21
Y1 - 2023/3/21
N2 - Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resource languages, such as those found in the African continent, have none that are standardized and available for use in NLP packages. Stopwords in the context of African languages are understudied and can reveal information about the crossover between languages. The \textit{African Stopwords} project aims to study and curate stopwords for African languages. In this paper, we present our current progress on ten African languages as well as future plans for the project.
AB - Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resource languages, such as those found in the African continent, have none that are standardized and available for use in NLP packages. Stopwords in the context of African languages are understudied and can reveal information about the crossover between languages. The \textit{African Stopwords} project aims to study and curate stopwords for African languages. In this paper, we present our current progress on ten African languages as well as future plans for the project.
KW - cs.CL
KW - cs.LG
M3 - Preprint
BT - The African Stopwords project
PB - Arxiv
ER -