Home > Research > Publications & Outputs > Building and annotating a corpus for the study ...
View graph of relations

Building and annotating a corpus for the study of journalistic text reuse

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Building and annotating a corpus for the study of journalistic text reuse. / Piao, Scott; Clough, Paul ; Gaizauskas, Robert.
3rd International Conference on Language Resources and Evaluation (LREC-2002). Las Palmas de Gran Canaria, Spain, 2002. p. 1678-1691.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Piao, S, Clough, P & Gaizauskas, R 2002, Building and annotating a corpus for the study of journalistic text reuse. in 3rd International Conference on Language Resources and Evaluation (LREC-2002). Las Palmas de Gran Canaria, Spain, pp. 1678-1691. <http://www.lrec-conf.org/proceedings/lrec2002/>

APA

Piao, S., Clough, P., & Gaizauskas, R. (2002). Building and annotating a corpus for the study of journalistic text reuse. In 3rd International Conference on Language Resources and Evaluation (LREC-2002) (pp. 1678-1691). http://www.lrec-conf.org/proceedings/lrec2002/

Vancouver

Piao S, Clough P, Gaizauskas R. Building and annotating a corpus for the study of journalistic text reuse. In 3rd International Conference on Language Resources and Evaluation (LREC-2002). Las Palmas de Gran Canaria, Spain. 2002. p. 1678-1691

Author

Piao, Scott ; Clough, Paul ; Gaizauskas, Robert. / Building and annotating a corpus for the study of journalistic text reuse. 3rd International Conference on Language Resources and Evaluation (LREC-2002). Las Palmas de Gran Canaria, Spain, 2002. pp. 1678-1691

Bibtex

@inproceedings{1a64ad02cda04e0f9fc1d4fe7738804e,
title = "Building and annotating a corpus for the study of journalistic text reuse",
abstract = "In this paper we present the METER Corpus, a novel resource for the study and analysis of journalistic text reuse. The corpus consists of a set of news stories written by the Press Association (PA), the major UK news agency, and a set of stories about the same news events, as published in various British newspapers. In some cases the newspaper stories are rewritten from the PA source; in other cases they have been independently written by the newspapers' own journalists. We discuss the motivation for creating the corpus, its contents, the annotation of certain attributes for analysis of text reuse and finally the encoding of those annotations into a standardised corpus format: the Text Encoding Initiative (TEI). ",
keywords = "Journalistic text reuse, TEI markup , Corpus annotation , Corpus , Paraphrase",
author = "Scott Piao and Paul Clough and Robert Gaizauskas",
year = "2002",
language = "English",
pages = "1678--1691",
booktitle = "3rd International Conference on Language Resources and Evaluation (LREC-2002)",

}

RIS

TY - GEN

T1 - Building and annotating a corpus for the study of journalistic text reuse

AU - Piao, Scott

AU - Clough, Paul

AU - Gaizauskas, Robert

PY - 2002

Y1 - 2002

N2 - In this paper we present the METER Corpus, a novel resource for the study and analysis of journalistic text reuse. The corpus consists of a set of news stories written by the Press Association (PA), the major UK news agency, and a set of stories about the same news events, as published in various British newspapers. In some cases the newspaper stories are rewritten from the PA source; in other cases they have been independently written by the newspapers' own journalists. We discuss the motivation for creating the corpus, its contents, the annotation of certain attributes for analysis of text reuse and finally the encoding of those annotations into a standardised corpus format: the Text Encoding Initiative (TEI).

AB - In this paper we present the METER Corpus, a novel resource for the study and analysis of journalistic text reuse. The corpus consists of a set of news stories written by the Press Association (PA), the major UK news agency, and a set of stories about the same news events, as published in various British newspapers. In some cases the newspaper stories are rewritten from the PA source; in other cases they have been independently written by the newspapers' own journalists. We discuss the motivation for creating the corpus, its contents, the annotation of certain attributes for analysis of text reuse and finally the encoding of those annotations into a standardised corpus format: the Text Encoding Initiative (TEI).

KW - Journalistic text reuse

KW - TEI markup

KW - Corpus annotation

KW - Corpus

KW - Paraphrase

M3 - Conference contribution/Paper

SP - 1678

EP - 1691

BT - 3rd International Conference on Language Resources and Evaluation (LREC-2002)

CY - Las Palmas de Gran Canaria, Spain

ER -