Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus

Computing and Communications

Associated organisational unit

UCREL - University Centre for Computer Corpus Research on Language

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Bassam Hammo
Azzam Sleit
Mahmoud El-Haj

More...

Publication date	2008
Host publication	The 19th Midwest Artificial Intelligence And Cognitive Science Conference Maics2008
Pages	189–196
Number of pages	8
<mark>Original language</mark>	English

Abstract

In this paper we discuss the enhancement of Arabic passage retrieval for both diacritisized and nondiacritisized text. Most previous work suggested that retrieval start with pre-processing the Arabic text to remove the diacritical marks (short vowels) to unify the text. In most cases, this process causes considerable
ambiguity at the word level in the absence of context.
However, searching for a word in diacritisized text requires typing and matching all its diacritical marks, which is cumbersome and prevents users from searching and hence retrieving valuable amount of text. The other way around, is to ignore these marks and fall into the problem of ambiguity. In this paper, we propose a passage retrieval approach to search for diacritic and diacritic-less text through query expansion to match a user’s query. We
applied a rule-based stemmer and we compiled a huge thesaurus for this purpose. We tested our approach on the scripts of the Quran as an open domain source of diacritisized text using a set of 40 non-diacritical words obtained from testers. The results are presented and the applied approach reveals future directions for search engines.

Research

Associated organisational unit

Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us