Home > Research > Publications & Outputs > Unfinished Business

Electronic data

  • ParlaClarin_II (1)

    Accepted author manuscript, 468 KB, PDF document

  • 2020.parlaclarin-1.5

    Final published version, 525 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

View graph of relations

Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Unfinished Business : Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day. / Coole, Matthew; Rayson, Paul; Mariani, John.

Proceedings of the Second ParlaCLARIN Workshop. ed. / Darja Fišer; Maria Eskevich; Franciska de Jong. Paris : European Language Resources Association (ELRA), 2020. p. 23-27.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Coole, M, Rayson, P & Mariani, J 2020, Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day. in D Fišer, M Eskevich & F de Jong (eds), Proceedings of the Second ParlaCLARIN Workshop. European Language Resources Association (ELRA), Paris, pp. 23-27. <https://www.aclweb.org/anthology/2020.parlaclarin-1.5>

APA

Coole, M., Rayson, P., & Mariani, J. (2020). Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day. In D. Fišer, M. Eskevich, & F. de Jong (Eds.), Proceedings of the Second ParlaCLARIN Workshop (pp. 23-27). European Language Resources Association (ELRA). https://www.aclweb.org/anthology/2020.parlaclarin-1.5

Vancouver

Coole M, Rayson P, Mariani J. Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day. In Fišer D, Eskevich M, de Jong F, editors, Proceedings of the Second ParlaCLARIN Workshop. Paris: European Language Resources Association (ELRA). 2020. p. 23-27

Author

Coole, Matthew ; Rayson, Paul ; Mariani, John. / Unfinished Business : Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day. Proceedings of the Second ParlaCLARIN Workshop. editor / Darja Fišer ; Maria Eskevich ; Franciska de Jong. Paris : European Language Resources Association (ELRA), 2020. pp. 23-27

Bibtex

@inproceedings{0a3dcb4be1794a288906ec928b0012ad,
title = "Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day",
abstract = "Creating, curating and maintaining modern political corpora is becoming an ever more involved task. As interest from various socialbodies and the general public in political discourse grows so too does the need to enrich such datasets with metadata and linguisticannotations. Beyond this, such corpora must be easy to browse and search for linguists, social scientists, digital humanists and thegeneral public. We present our efforts to compile a linguistically annotated and semantically tagged version of the Hansard corpus from1803 right up to the present day. This involves combining multiple sources of documents and transcripts. We describe our toolchainfor tagging; using several existing tools that provide tokenisation, part-of-speech tagging and semantic annotations. We also provide anoverview of our bespoke web-based search interface built on LexiDB. In conclusion, we examine the completed corpus by looking atfour case studies making use of semantic categories made available by our toolchain.",
author = "Matthew Coole and Paul Rayson and John Mariani",
year = "2020",
month = may,
day = "11",
language = "English",
isbn = "9791095546474",
pages = "23--27",
editor = "Fi{\v s}er, {Darja } and Eskevich, {Maria } and {de Jong}, {Franciska }",
booktitle = "Proceedings of the Second ParlaCLARIN Workshop",
publisher = "European Language Resources Association (ELRA)",

}

RIS

TY - GEN

T1 - Unfinished Business

T2 - Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day

AU - Coole, Matthew

AU - Rayson, Paul

AU - Mariani, John

PY - 2020/5/11

Y1 - 2020/5/11

N2 - Creating, curating and maintaining modern political corpora is becoming an ever more involved task. As interest from various socialbodies and the general public in political discourse grows so too does the need to enrich such datasets with metadata and linguisticannotations. Beyond this, such corpora must be easy to browse and search for linguists, social scientists, digital humanists and thegeneral public. We present our efforts to compile a linguistically annotated and semantically tagged version of the Hansard corpus from1803 right up to the present day. This involves combining multiple sources of documents and transcripts. We describe our toolchainfor tagging; using several existing tools that provide tokenisation, part-of-speech tagging and semantic annotations. We also provide anoverview of our bespoke web-based search interface built on LexiDB. In conclusion, we examine the completed corpus by looking atfour case studies making use of semantic categories made available by our toolchain.

AB - Creating, curating and maintaining modern political corpora is becoming an ever more involved task. As interest from various socialbodies and the general public in political discourse grows so too does the need to enrich such datasets with metadata and linguisticannotations. Beyond this, such corpora must be easy to browse and search for linguists, social scientists, digital humanists and thegeneral public. We present our efforts to compile a linguistically annotated and semantically tagged version of the Hansard corpus from1803 right up to the present day. This involves combining multiple sources of documents and transcripts. We describe our toolchainfor tagging; using several existing tools that provide tokenisation, part-of-speech tagging and semantic annotations. We also provide anoverview of our bespoke web-based search interface built on LexiDB. In conclusion, we examine the completed corpus by looking atfour case studies making use of semantic categories made available by our toolchain.

M3 - Conference contribution/Paper

SN - 9791095546474

SP - 23

EP - 27

BT - Proceedings of the Second ParlaCLARIN Workshop

A2 - Fišer, Darja

A2 - Eskevich, Maria

A2 - de Jong, Franciska

PB - European Language Resources Association (ELRA)

CY - Paris

ER -