Home > Research > Publications & Outputs > The ParlaMint corpora of parliamentary proceedings

Electronic data

  • s10579-021-09574-0

    Final published version, 2.14 MB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Text available via DOI:

View graph of relations

The ParlaMint corpora of parliamentary proceedings

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published
  • Tomaž Erjavec
  • Maciej Ogrodniczuk
  • Petya Osenova
  • Nikola Ljubešić
  • Kiril Simov
  • Andrej Pančur
  • Michał Rudolf
  • Matyáš Kopp
  • Starkaður Barkarson
  • Steinþór Steingrímsson
  • Çağrı Çöltekin
  • Jesse de Does
  • Katrien Depuydt
  • Tommaso Agnoloni
  • Giulia Venturi
  • María Calzada Pérez
  • Luciana D. de Macedo
  • Costanza Navarretta
  • Giancarlo Luxardo
  • Vaidas Morkevičius
  • Tomas Krilavičius
  • Roberts Darǵis
  • Orsolya Ring
  • Ruben van Heusden
  • Maarten Marx
  • Darja Fišer
Close
<mark>Journal publication date</mark>31/03/2023
<mark>Journal</mark>Language Resources and Evaluation
Issue number1
Volume57
Number of pages34
Pages (from-to)415-448
Publication StatusPublished
Early online date2/02/22
<mark>Original language</mark>English

Abstract

This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.