Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus

Computing and Communications

Associated organisational unit

UCREL - University Centre for Computer Corpus Research on Language

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus. / Hammo, Bassam; Sleit, Azzam; El-Haj, Mahmoud.
The 19th Midwest Artificial Intelligence And Cognitive Science Conference Maics2008. 2008. p. 189–196.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Hammo, B, Sleit, A & El-Haj, M 2008, Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus. in The 19th Midwest Artificial Intelligence And Cognitive Science Conference Maics2008. pp. 189–196.

APA

Hammo, B., Sleit, A., & El-Haj, M. (2008). Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus. In The 19th Midwest Artificial Intelligence And Cognitive Science Conference Maics2008 (pp. 189–196)

Vancouver

Hammo B, Sleit A, El-Haj M. Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus. In The 19th Midwest Artificial Intelligence And Cognitive Science Conference Maics2008. 2008. p. 189–196

Author

Hammo, Bassam ; Sleit, Azzam ; El-Haj, Mahmoud. / Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus. The 19th Midwest Artificial Intelligence And Cognitive Science Conference Maics2008. 2008. pp. 189–196

Bibtex

@inproceedings{b0605b04271f4572a38b314583d5564b,

title = "Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus",

abstract = "In this paper we discuss the enhancement of Arabic passage retrieval for both diacritisized and nondiacritisized text. Most previous work suggested that retrieval start with pre-processing the Arabic text to remove the diacritical marks (short vowels) to unify the text. In most cases, this process causes considerableambiguity at the word level in the absence of context.However, searching for a word in diacritisized text requires typing and matching all its diacritical marks, which is cumbersome and prevents users from searching and hence retrieving valuable amount of text. The other way around, is to ignore these marks and fall into the problem of ambiguity. In this paper, we propose a passage retrieval approach to search for diacritic and diacritic-less text through query expansion to match a user{\textquoteright}s query. Weapplied a rule-based stemmer and we compiled a huge thesaurus for this purpose. We tested our approach on the scripts of the Quran as an open domain source of diacritisized text using a set of 40 non-diacritical words obtained from testers. The results are presented and the applied approach reveals future directions for search engines.",

author = "Bassam Hammo and Azzam Sleit and Mahmoud El-Haj",

year = "2008",

language = "English",

pages = "189–196",

booktitle = "The 19th Midwest Artificial Intelligence And Cognitive Science Conference Maics2008",

}

RIS

TY - GEN

T1 - Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus

AU - Hammo, Bassam

AU - Sleit, Azzam

AU - El-Haj, Mahmoud

PY - 2008

Y1 - 2008

N2 - In this paper we discuss the enhancement of Arabic passage retrieval for both diacritisized and nondiacritisized text. Most previous work suggested that retrieval start with pre-processing the Arabic text to remove the diacritical marks (short vowels) to unify the text. In most cases, this process causes considerableambiguity at the word level in the absence of context.However, searching for a word in diacritisized text requires typing and matching all its diacritical marks, which is cumbersome and prevents users from searching and hence retrieving valuable amount of text. The other way around, is to ignore these marks and fall into the problem of ambiguity. In this paper, we propose a passage retrieval approach to search for diacritic and diacritic-less text through query expansion to match a user’s query. Weapplied a rule-based stemmer and we compiled a huge thesaurus for this purpose. We tested our approach on the scripts of the Quran as an open domain source of diacritisized text using a set of 40 non-diacritical words obtained from testers. The results are presented and the applied approach reveals future directions for search engines.

AB - In this paper we discuss the enhancement of Arabic passage retrieval for both diacritisized and nondiacritisized text. Most previous work suggested that retrieval start with pre-processing the Arabic text to remove the diacritical marks (short vowels) to unify the text. In most cases, this process causes considerableambiguity at the word level in the absence of context.However, searching for a word in diacritisized text requires typing and matching all its diacritical marks, which is cumbersome and prevents users from searching and hence retrieving valuable amount of text. The other way around, is to ignore these marks and fall into the problem of ambiguity. In this paper, we propose a passage retrieval approach to search for diacritic and diacritic-less text through query expansion to match a user’s query. Weapplied a rule-based stemmer and we compiled a huge thesaurus for this purpose. We tested our approach on the scripts of the Quran as an open domain source of diacritisized text using a set of 40 non-diacritical words obtained from testers. The results are presented and the applied approach reveals future directions for search engines.

M3 - Conference contribution/Paper

SP - 189

EP - 196

BT - The 19th Midwest Artificial Intelligence And Cognitive Science Conference Maics2008

ER -

Research

Associated organisational unit