Unfinished Business - Research Portal

Associated organisational units

Electronic data

ParlaClarin_II (1)
Accepted author manuscript, 468 KB, PDF document
2020.parlaclarin-1.5
Final published version, 525 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day. / Coole, Matthew ; Rayson, Paul ; Mariani, John.
Proceedings of the Second ParlaCLARIN Workshop. ed. / Darja Fišer; Maria Eskevich; Franciska de Jong. Paris: European Language Resources Association (ELRA), 2020. p. 23-27.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Bibtex

@inproceedings{0a3dcb4be1794a288906ec928b0012ad,

title = "Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day",

abstract = "Creating, curating and maintaining modern political corpora is becoming an ever more involved task. As interest from various socialbodies and the general public in political discourse grows so too does the need to enrich such datasets with metadata and linguisticannotations. Beyond this, such corpora must be easy to browse and search for linguists, social scientists, digital humanists and thegeneral public. We present our efforts to compile a linguistically annotated and semantically tagged version of the Hansard corpus from1803 right up to the present day. This involves combining multiple sources of documents and transcripts. We describe our toolchainfor tagging; using several existing tools that provide tokenisation, part-of-speech tagging and semantic annotations. We also provide anoverview of our bespoke web-based search interface built on LexiDB. In conclusion, we examine the completed corpus by looking atfour case studies making use of semantic categories made available by our toolchain.",

author = "Matthew Coole and Paul Rayson and John Mariani",

year = "2020",

month = may,

day = "11",

language = "English",

isbn = "9791095546474",

pages = "23--27",

editor = "Fi{\v s}er, {Darja } and Eskevich, {Maria } and {de Jong}, {Franciska }",

booktitle = "Proceedings of the Second ParlaCLARIN Workshop",

publisher = "European Language Resources Association (ELRA)",

}

RIS

TY - GEN

T1 - Unfinished Business

T2 - Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day

AU - Coole, Matthew

AU - Rayson, Paul

AU - Mariani, John

PY - 2020/5/11

Y1 - 2020/5/11

N2 - Creating, curating and maintaining modern political corpora is becoming an ever more involved task. As interest from various socialbodies and the general public in political discourse grows so too does the need to enrich such datasets with metadata and linguisticannotations. Beyond this, such corpora must be easy to browse and search for linguists, social scientists, digital humanists and thegeneral public. We present our efforts to compile a linguistically annotated and semantically tagged version of the Hansard corpus from1803 right up to the present day. This involves combining multiple sources of documents and transcripts. We describe our toolchainfor tagging; using several existing tools that provide tokenisation, part-of-speech tagging and semantic annotations. We also provide anoverview of our bespoke web-based search interface built on LexiDB. In conclusion, we examine the completed corpus by looking atfour case studies making use of semantic categories made available by our toolchain.

AB - Creating, curating and maintaining modern political corpora is becoming an ever more involved task. As interest from various socialbodies and the general public in political discourse grows so too does the need to enrich such datasets with metadata and linguisticannotations. Beyond this, such corpora must be easy to browse and search for linguists, social scientists, digital humanists and thegeneral public. We present our efforts to compile a linguistically annotated and semantically tagged version of the Hansard corpus from1803 right up to the present day. This involves combining multiple sources of documents and transcripts. We describe our toolchainfor tagging; using several existing tools that provide tokenisation, part-of-speech tagging and semantic annotations. We also provide anoverview of our bespoke web-based search interface built on LexiDB. In conclusion, we examine the completed corpus by looking atfour case studies making use of semantic categories made available by our toolchain.

M3 - Conference contribution/Paper

SN - 9791095546474

SP - 23

EP - 27

BT - Proceedings of the Second ParlaCLARIN Workshop

A2 - Fišer, Darja

A2 - Eskevich, Maria

A2 - de Jong, Franciska

PB - European Language Resources Association (ELRA)

CY - Paris

ER -

Research

Associated organisational units

Electronic data

Links

Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us