Building and annotating a corpus for the study of journalistic text reuse

Computing and Communications

Keywords

Journalistic text reuse, TEI markup , Corpus annotation , Corpus , Paraphrase

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Scott Piao
Paul Clough
Robert Gaizauskas

More...

Publication date	2002
Host publication	3rd International Conference on Language Resources and Evaluation (LREC-2002)
Place of Publication	Las Palmas de Gran Canaria, Spain
Pages	1678-1691
Number of pages	14
<mark>Original language</mark>	English

Abstract

In this paper we present the METER Corpus, a novel resource for the study and analysis of journalistic text reuse. The corpus consists of a set of news stories written by the Press Association (PA), the major UK news agency, and a set of stories about the same news events, as published in various British newspapers. In some cases the newspaper stories are rewritten from the PA source; in other cases they have been independently written by the newspapers' own journalists. We discuss the motivation for creating the corpus, its contents, the annotation of certain attributes for analysis of text reuse and finally the encoding of those annotations into a standardised corpus format: the Text Encoding Initiative (TEI).

Research

Links

Keywords

Building and annotating a corpus for the study of journalistic text reuse

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us