Home > Research > Publications & Outputs > Arabic multi-document text summarisation

Electronic data

Links

View graph of relations

Arabic multi-document text summarisation

Research output: ThesisDoctoral Thesis

Published
Publication date2012
Number of pages165
Place of PublicationColchester, Essex
Publisher
  • University of Essex
<mark>Original language</mark>English

Abstract

Multi-document summarisation is the process of producing a single summary of a collection of related documents. Much of the current work on multi-document text summarisation is concerned with the English language; relevant resources are numerous and readily available. These resources include human generated (gold-standard) and automatic summaries. Arabic multi-document summarisation is still in its infancy. One of the obstacles to progress is the limited availability of Arabic resources to support this research. When we started our research there were no publicly available Arabic multi-document gold-standard summaries, which are needed to automatically evaluate system generated summaries. The Document Understanding Conference (DUC) and Text Analysis Conference (TAC) at that time provided resources such as gold-standard extractive and abstractive summaries (both human and system generated) that were only available in English. Our aim was to push forward the state-of-the-art in Arabic multi-document summarisation. This required advancements in at least two areas. The first area was the creation of Arabic test collections. The second area was concerned with the actual summarisation process to find methods that improve the quality of Arabic summaries. To address both points we created single and multi-document Arabic test collections both automatically and manually using a commonly used English dataset and by having human participants. We developed extractive language
dependent and language independent single and multi-document summarisers, both for Arabic and English. In our work we provided state-of-the-art approaches for Arabic multi-document summarisation. We succeeded in including Arabic in one of the leading summarisation conferences the Text Analysis Conference (TAC). Researchers on Arabic multi-document summarisation now have resources and tools that can be used to advance the research in this field.

Bibliographic note

Thesis (Ph.D.), School of Computer Science and Electronic Engineering, University of Essex, 2012