Final published version, 2.14 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - The ParlaMint corpora of parliamentary proceedings
AU - Erjavec, Tomaž
AU - Ogrodniczuk, Maciej
AU - Osenova, Petya
AU - Ljubešić, Nikola
AU - Simov, Kiril
AU - Pančur, Andrej
AU - Rudolf, Michał
AU - Kopp, Matyáš
AU - Barkarson, Starkaður
AU - Steingrímsson, Steinþór
AU - Çöltekin, Çağrı
AU - de Does, Jesse
AU - Depuydt, Katrien
AU - Agnoloni, Tommaso
AU - Venturi, Giulia
AU - Pérez, María Calzada
AU - de Macedo, Luciana D.
AU - Navarretta, Costanza
AU - Luxardo, Giancarlo
AU - Coole, Matthew
AU - Rayson, Paul
AU - Morkevičius, Vaidas
AU - Krilavičius, Tomas
AU - Darǵis, Roberts
AU - Ring, Orsolya
AU - van Heusden, Ruben
AU - Marx, Maarten
AU - Fišer, Darja
PY - 2023/3/31
Y1 - 2023/3/31
N2 - This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.
AB - This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.
KW - Parliamentary proceedings
KW - Comparable corpora
KW - TEI
U2 - 10.1007/s10579-021-09574-0
DO - 10.1007/s10579-021-09574-0
M3 - Journal article
C2 - 35125984
VL - 57
SP - 415
EP - 448
JO - Language Resources and Evaluation
JF - Language Resources and Evaluation
SN - 1574-0218
IS - 1
ER -