Arabic multi-document text summarisation

Computing and Communications

Associated organisational unit

UCREL - University Centre for Computer Corpus Research on Language

Electronic data

Mahmoud_ELHAJ_PHD_Thesis_2012
Accepted author manuscript, 3.26 MB, PDF document

Research output: Thesis › Doctoral Thesis

Published

Standard

Arabic multi-document text summarisation. / El-Haj, Mahmoud.
Colchester, Essex: University of Essex, 2012. 165 p.

Research output: Thesis › Doctoral Thesis

Harvard

El-Haj, M 2012, 'Arabic multi-document text summarisation', Colchester, Essex. <http://serlib0.essex.ac.uk/record=b1807018~S5>

APA

El-Haj, M. (2012). Arabic multi-document text summarisation. University of Essex.

Vancouver

El-Haj M. Arabic multi-document text summarisation. Colchester, Essex: University of Essex, 2012. 165 p.

Author

El-Haj, Mahmoud. / Arabic multi-document text summarisation. Colchester, Essex : University of Essex, 2012. 165 p.

Bibtex

@phdthesis{16543eb7f4dd4437b66b4069a60fb7b8,

title = "Arabic multi-document text summarisation",

abstract = "Multi-document summarisation is the process of producing a single summary of a collection of related documents. Much of the current work on multi-document text summarisation is concerned with the English language; relevant resources are numerous and readily available. These resources include human generated (gold-standard) and automatic summaries. Arabic multi-document summarisation is still in its infancy. One of the obstacles to progress is the limited availability of Arabic resources to support this research. When we started our research there were no publicly available Arabic multi-document gold-standard summaries, which are needed to automatically evaluate system generated summaries. The Document Understanding Conference (DUC) and Text Analysis Conference (TAC) at that time provided resources such as gold-standard extractive and abstractive summaries (both human and system generated) that were only available in English. Our aim was to push forward the state-of-the-art in Arabic multi-document summarisation. This required advancements in at least two areas. The first area was the creation of Arabic test collections. The second area was concerned with the actual summarisation process to find methods that improve the quality of Arabic summaries. To address both points we created single and multi-document Arabic test collections both automatically and manually using a commonly used English dataset and by having human participants. We developed extractive languagedependent and language independent single and multi-document summarisers, both for Arabic and English. In our work we provided state-of-the-art approaches for Arabic multi-document summarisation. We succeeded in including Arabic in one of the leading summarisation conferences the Text Analysis Conference (TAC). Researchers on Arabic multi-document summarisation now have resources and tools that can be used to advance the research in this field.",

author = "Mahmoud El-Haj",

note = "Thesis (Ph.D.), School of Computer Science and Electronic Engineering, University of Essex, 2012",

year = "2012",

language = "English",

publisher = "University of Essex",

}

RIS

TY - BOOK

T1 - Arabic multi-document text summarisation

AU - El-Haj, Mahmoud

N1 - Thesis (Ph.D.), School of Computer Science and Electronic Engineering, University of Essex, 2012

PY - 2012

Y1 - 2012

N2 - Multi-document summarisation is the process of producing a single summary of a collection of related documents. Much of the current work on multi-document text summarisation is concerned with the English language; relevant resources are numerous and readily available. These resources include human generated (gold-standard) and automatic summaries. Arabic multi-document summarisation is still in its infancy. One of the obstacles to progress is the limited availability of Arabic resources to support this research. When we started our research there were no publicly available Arabic multi-document gold-standard summaries, which are needed to automatically evaluate system generated summaries. The Document Understanding Conference (DUC) and Text Analysis Conference (TAC) at that time provided resources such as gold-standard extractive and abstractive summaries (both human and system generated) that were only available in English. Our aim was to push forward the state-of-the-art in Arabic multi-document summarisation. This required advancements in at least two areas. The first area was the creation of Arabic test collections. The second area was concerned with the actual summarisation process to find methods that improve the quality of Arabic summaries. To address both points we created single and multi-document Arabic test collections both automatically and manually using a commonly used English dataset and by having human participants. We developed extractive languagedependent and language independent single and multi-document summarisers, both for Arabic and English. In our work we provided state-of-the-art approaches for Arabic multi-document summarisation. We succeeded in including Arabic in one of the leading summarisation conferences the Text Analysis Conference (TAC). Researchers on Arabic multi-document summarisation now have resources and tools that can be used to advance the research in this field.

AB - Multi-document summarisation is the process of producing a single summary of a collection of related documents. Much of the current work on multi-document text summarisation is concerned with the English language; relevant resources are numerous and readily available. These resources include human generated (gold-standard) and automatic summaries. Arabic multi-document summarisation is still in its infancy. One of the obstacles to progress is the limited availability of Arabic resources to support this research. When we started our research there were no publicly available Arabic multi-document gold-standard summaries, which are needed to automatically evaluate system generated summaries. The Document Understanding Conference (DUC) and Text Analysis Conference (TAC) at that time provided resources such as gold-standard extractive and abstractive summaries (both human and system generated) that were only available in English. Our aim was to push forward the state-of-the-art in Arabic multi-document summarisation. This required advancements in at least two areas. The first area was the creation of Arabic test collections. The second area was concerned with the actual summarisation process to find methods that improve the quality of Arabic summaries. To address both points we created single and multi-document Arabic test collections both automatically and manually using a commonly used English dataset and by having human participants. We developed extractive languagedependent and language independent single and multi-document summarisers, both for Arabic and English. In our work we provided state-of-the-art approaches for Arabic multi-document summarisation. We succeeded in including Arabic in one of the leading summarisation conferences the Text Analysis Conference (TAC). Researchers on Arabic multi-document summarisation now have resources and tools that can be used to advance the research in this field.

M3 - Doctoral Thesis

PB - University of Essex

CY - Colchester, Essex

ER -

Research

Associated organisational unit

Electronic data

Links