Home > Research > Publications & Outputs > A New Aligned Simple German Corpus

Links

Text available via DOI:

View graph of relations

A New Aligned Simple German Corpus

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

A New Aligned Simple German Corpus. / Toborek, Vanessa; Busch, Moritz; Boßert, Malte et al.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics (ACL Anthology), 2023. p. 11393-11412.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Toborek, V, Busch, M, Boßert, M, Bauckhage, C & Welke, P 2023, A New Aligned Simple German Corpus. in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics (ACL Anthology), Stroudsburg, PA, pp. 11393-11412. https://doi.org/10.18653/V1/2023.ACL-LONG.638

APA

Toborek, V., Busch, M., Boßert, M., Bauckhage, C., & Welke, P. (2023). A New Aligned Simple German Corpus. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 11393-11412). Association for Computational Linguistics (ACL Anthology). https://doi.org/10.18653/V1/2023.ACL-LONG.638

Vancouver

Toborek V, Busch M, Boßert M, Bauckhage C, Welke P. A New Aligned Simple German Corpus. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics (ACL Anthology). 2023. p. 11393-11412 doi: 10.18653/V1/2023.ACL-LONG.638

Author

Toborek, Vanessa ; Busch, Moritz ; Boßert, Malte et al. / A New Aligned Simple German Corpus. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA : Association for Computational Linguistics (ACL Anthology), 2023. pp. 11393-11412

Bibtex

@inproceedings{5f0876d0ea5d468ba8b1eba8e55dcd6b,
title = "A New Aligned Simple German Corpus",
abstract = "“Leichte Sprache”, the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German – German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by the F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.",
author = "Vanessa Toborek and Moritz Busch and Malte Bo{\ss}ert and Christian Bauckhage and Pascal Welke",
note = "DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.",
year = "2023",
month = jul,
day = "9",
doi = "10.18653/V1/2023.ACL-LONG.638",
language = "English",
pages = "11393--11412",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
publisher = "Association for Computational Linguistics (ACL Anthology)",

}

RIS

TY - GEN

T1 - A New Aligned Simple German Corpus

AU - Toborek, Vanessa

AU - Busch, Moritz

AU - Boßert, Malte

AU - Bauckhage, Christian

AU - Welke, Pascal

N1 - DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

PY - 2023/7/9

Y1 - 2023/7/9

N2 - “Leichte Sprache”, the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German – German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by the F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.

AB - “Leichte Sprache”, the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German – German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by the F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.

U2 - 10.18653/V1/2023.ACL-LONG.638

DO - 10.18653/V1/2023.ACL-LONG.638

M3 - Conference contribution/Paper

SP - 11393

EP - 11412

BT - Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

PB - Association for Computational Linguistics (ACL Anthology)

CY - Stroudsburg, PA

ER -