Home > Research > Publications & Outputs > ParlaMint II

Links

Text available via DOI:

View graph of relations

ParlaMint II: advancing comparable parliamentary corpora across Europe

Research output: Contribution to Journal/MagazineJournal articlepeer-review

E-pub ahead of print

Standard

ParlaMint II: advancing comparable parliamentary corpora across Europe. / Erjavec, Tomaž; Kopp, Matyáš; Ljubešić, Nikola et al.
In: Language Resources and Evaluation, 28.12.2024.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Erjavec, T, Kopp, M, Ljubešić, N, Kuzman, T, Rayson, P, Osenova, P, Ogrodniczuk, M, Çöltekin, Ç, Koržinek, D, Meden, K, Skubic, J, Rupnik, P, Agnoloni, T, Aires, J, Barkarson, S, Bartolini, R, Bel, N, Calzada Pérez, M, Darģis, R, Diwersy, S, Gavriilidou, M, van Heusden, R, Iruskieta, M, Kahusk, N, Kryvenko, A, Ligeti-Nagy, N, Magariños, C, Mölder, M, Navarretta, C, Simov, K, Tungland, LM, Tuominen, J, Vidler, J, Vladu, AI, Wissik, T, Yrjänäinen, V & Fišer, D 2024, 'ParlaMint II: advancing comparable parliamentary corpora across Europe', Language Resources and Evaluation. https://doi.org/10.1007/s10579-024-09798-w

APA

Erjavec, T., Kopp, M., Ljubešić, N., Kuzman, T., Rayson, P., Osenova, P., Ogrodniczuk, M., Çöltekin, Ç., Koržinek, D., Meden, K., Skubic, J., Rupnik, P., Agnoloni, T., Aires, J., Barkarson, S., Bartolini, R., Bel, N., Calzada Pérez, M., Darģis, R., ... Fišer, D. (2024). ParlaMint II: advancing comparable parliamentary corpora across Europe. Language Resources and Evaluation. Advance online publication. https://doi.org/10.1007/s10579-024-09798-w

Vancouver

Erjavec T, Kopp M, Ljubešić N, Kuzman T, Rayson P, Osenova P et al. ParlaMint II: advancing comparable parliamentary corpora across Europe. Language Resources and Evaluation. 2024 Dec 28. Epub 2024 Dec 28. doi: 10.1007/s10579-024-09798-w

Author

Erjavec, Tomaž ; Kopp, Matyáš ; Ljubešić, Nikola et al. / ParlaMint II : advancing comparable parliamentary corpora across Europe. In: Language Resources and Evaluation. 2024.

Bibtex

@article{08631e56398544f3b735b30b94b3ce02,
title = "ParlaMint II: advancing comparable parliamentary corpora across Europe",
abstract = "The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed.",
keywords = "Comparable corpora, Parliamentary proceedings, TEI",
author = "Toma{\v z} Erjavec and Maty{\'a}{\v s} Kopp and Nikola Ljube{\v s}i{\'c} and Taja Kuzman and Paul Rayson and Petya Osenova and Maciej Ogrodniczuk and {\c C}ağrı {\c C}{\"o}ltekin and Danijel Kor{\v z}inek and Katja Meden and Jure Skubic and Peter Rupnik and Tommaso Agnoloni and Jos{\'e} Aires and Starka{\dh}ur Barkarson and Roberto Bartolini and N{\'u}ria Bel and {Calzada P{\'e}rez}, Mar{\'i}a and Roberts Darģis and Sascha Diwersy and Maria Gavriilidou and {van Heusden}, Ruben and Mikel Iruskieta and Neeme Kahusk and Anna Kryvenko and No{\'e}mi Ligeti-Nagy and Carmen Magari{\~n}os and Martin M{\"o}lder and Costanza Navarretta and Kiril Simov and Tungland, {Lars Magne} and Jouni Tuominen and John Vidler and Vladu, {Adina Ioana} and Tanja Wissik and V{\"a}in{\"o} Yrj{\"a}n{\"a}inen and Darja Fi{\v s}er",
year = "2024",
month = dec,
day = "28",
doi = "10.1007/s10579-024-09798-w",
language = "English",
journal = "Language Resources and Evaluation",
issn = "1574-020X",
publisher = "Springer Netherlands",

}

RIS

TY - JOUR

T1 - ParlaMint II

T2 - advancing comparable parliamentary corpora across Europe

AU - Erjavec, Tomaž

AU - Kopp, Matyáš

AU - Ljubešić, Nikola

AU - Kuzman, Taja

AU - Rayson, Paul

AU - Osenova, Petya

AU - Ogrodniczuk, Maciej

AU - Çöltekin, Çağrı

AU - Koržinek, Danijel

AU - Meden, Katja

AU - Skubic, Jure

AU - Rupnik, Peter

AU - Agnoloni, Tommaso

AU - Aires, José

AU - Barkarson, Starkaður

AU - Bartolini, Roberto

AU - Bel, Núria

AU - Calzada Pérez, María

AU - Darģis, Roberts

AU - Diwersy, Sascha

AU - Gavriilidou, Maria

AU - van Heusden, Ruben

AU - Iruskieta, Mikel

AU - Kahusk, Neeme

AU - Kryvenko, Anna

AU - Ligeti-Nagy, Noémi

AU - Magariños, Carmen

AU - Mölder, Martin

AU - Navarretta, Costanza

AU - Simov, Kiril

AU - Tungland, Lars Magne

AU - Tuominen, Jouni

AU - Vidler, John

AU - Vladu, Adina Ioana

AU - Wissik, Tanja

AU - Yrjänäinen, Väinö

AU - Fišer, Darja

PY - 2024/12/28

Y1 - 2024/12/28

N2 - The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed.

AB - The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed.

KW - Comparable corpora

KW - Parliamentary proceedings

KW - TEI

U2 - 10.1007/s10579-024-09798-w

DO - 10.1007/s10579-024-09798-w

M3 - Journal article

JO - Language Resources and Evaluation

JF - Language Resources and Evaluation

SN - 1574-020X

ER -