Home > Research > Publications & Outputs > The African Stopwords project

Electronic data

  • 2304.12155v1

    Accepted author manuscript, 78.3 KB, PDF document

Keywords

View graph of relations

The African Stopwords project: curating stopwords for African languages

Research output: Working paperPreprint

Published

Standard

The African Stopwords project: curating stopwords for African languages. / Emezue, Chris; Nigatu, Hellina; Chukwuneke, Chiamaka et al.
Arxiv, 2023.

Research output: Working paperPreprint

Harvard

Emezue, C, Nigatu, H, Chukwuneke, C, Thinwa, C, Zhou, H, Muhammad, S, Louis, L, Abdulmumin, I, Oyerinde, S, Ajibade, B, Samuel, O, Joshua, O, Onwuegbuzia, E, Emezue, H, Ige, IA, Tonja, AL, Dossou, BFP, Etori, NA, Emmanuel, MC, Yousuf, O, Aina, K & David, D 2023 'The African Stopwords project: curating stopwords for African languages' Arxiv.

APA

Emezue, C., Nigatu, H., Chukwuneke, C., Thinwa, C., Zhou, H., Muhammad, S., Louis, L., Abdulmumin, I., Oyerinde, S., Ajibade, B., Samuel, O., Joshua, O., Onwuegbuzia, E., Emezue, H., Ige, I. A., Tonja, A. L., Dossou, B. F. P., Etori, N. A., Emmanuel, M. C., ... David, D. (2023). The African Stopwords project: curating stopwords for African languages. Arxiv.

Vancouver

Emezue C, Nigatu H, Chukwuneke C, Thinwa C, Zhou H, Muhammad S et al. The African Stopwords project: curating stopwords for African languages. Arxiv. 2023 Mar 21.

Author

Bibtex

@techreport{5abe226229a647fe9bd9a32d8ea92aa6,
title = "The African Stopwords project: curating stopwords for African languages",
abstract = "Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resource languages, such as those found in the African continent, have none that are standardized and available for use in NLP packages. Stopwords in the context of African languages are understudied and can reveal information about the crossover between languages. The \textit{African Stopwords} project aims to study and curate stopwords for African languages. In this paper, we present our current progress on ten African languages as well as future plans for the project.",
keywords = "cs.CL, cs.LG",
author = "Chris Emezue and Hellina Nigatu and Chiamaka Chukwuneke and Cynthia Thinwa and Helper Zhou and Shamsuddeen Muhammad and Lerato Louis and Idris Abdulmumin and Samuel Oyerinde and Benjamin Ajibade and Olanrewaju Samuel and Oviawe Joshua and Emeka Onwuegbuzia and Handel Emezue and Ige, {Ifeoluwatayo A.} and Tonja, {Atnafu Lambebo} and Dossou, {Bonaventure F. P.} and Etori, {Naome A.} and Emmanuel, {Mbonu Chinedu} and Oreen Yousuf and Kaosarat Aina and Davis David",
note = "Accepted at the AfricaNLP workshop at ICLR2022",
year = "2023",
month = mar,
day = "21",
language = "English",
publisher = "Arxiv",
type = "WorkingPaper",
institution = "Arxiv",

}

RIS

TY - UNPB

T1 - The African Stopwords project

T2 - curating stopwords for African languages

AU - Emezue, Chris

AU - Nigatu, Hellina

AU - Chukwuneke, Chiamaka

AU - Thinwa, Cynthia

AU - Zhou, Helper

AU - Muhammad, Shamsuddeen

AU - Louis, Lerato

AU - Abdulmumin, Idris

AU - Oyerinde, Samuel

AU - Ajibade, Benjamin

AU - Samuel, Olanrewaju

AU - Joshua, Oviawe

AU - Onwuegbuzia, Emeka

AU - Emezue, Handel

AU - Ige, Ifeoluwatayo A.

AU - Tonja, Atnafu Lambebo

AU - Dossou, Bonaventure F. P.

AU - Etori, Naome A.

AU - Emmanuel, Mbonu Chinedu

AU - Yousuf, Oreen

AU - Aina, Kaosarat

AU - David, Davis

N1 - Accepted at the AfricaNLP workshop at ICLR2022

PY - 2023/3/21

Y1 - 2023/3/21

N2 - Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resource languages, such as those found in the African continent, have none that are standardized and available for use in NLP packages. Stopwords in the context of African languages are understudied and can reveal information about the crossover between languages. The \textit{African Stopwords} project aims to study and curate stopwords for African languages. In this paper, we present our current progress on ten African languages as well as future plans for the project.

AB - Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resource languages, such as those found in the African continent, have none that are standardized and available for use in NLP packages. Stopwords in the context of African languages are understudied and can reveal information about the crossover between languages. The \textit{African Stopwords} project aims to study and curate stopwords for African languages. In this paper, we present our current progress on ten African languages as well as future plans for the project.

KW - cs.CL

KW - cs.LG

M3 - Preprint

BT - The African Stopwords project

PB - Arxiv

ER -