Home > Research > Publications & Outputs > A New Aligned Simple German Corpus

Electronic data

  • 2209.01106v4

    Submitted manuscript, 716 KB, PDF document

    Available under license: CC BY-SA: Creative Commons Attribution-ShareAlike 4.0 International License

Links

Keywords

View graph of relations

A New Aligned Simple German Corpus

Research output: Working paperPreprint

Published

Standard

A New Aligned Simple German Corpus. / Toborek, Vanessa; Busch, Moritz; Boßert, Malte et al.
Arxiv, 2022.

Research output: Working paperPreprint

Harvard

Toborek, V, Busch, M, Boßert, M, Bauckhage, C & Welke, P 2022 'A New Aligned Simple German Corpus' Arxiv. <https://arxiv.org/abs/2209.01106v4>

APA

Toborek, V., Busch, M., Boßert, M., Bauckhage, C., & Welke, P. (2022). A New Aligned Simple German Corpus. Arxiv. https://arxiv.org/abs/2209.01106v4

Vancouver

Toborek V, Busch M, Boßert M, Bauckhage C, Welke P. A New Aligned Simple German Corpus. Arxiv. 2022 Sept 2.

Author

Toborek, Vanessa ; Busch, Moritz ; Boßert, Malte et al. / A New Aligned Simple German Corpus. Arxiv, 2022.

Bibtex

@techreport{006e53631e6d43e39f077f255f373044,
title = "A New Aligned Simple German Corpus",
abstract = " {"}Leichte Sprache{"}, the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German -- German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license. ",
keywords = "cs.CL",
author = "Vanessa Toborek and Moritz Busch and Malte Bo{\ss}ert and Christian Bauckhage and Pascal Welke",
note = "Accepted at ACL 2023",
year = "2022",
month = sep,
day = "2",
language = "English",
publisher = "Arxiv",
type = "WorkingPaper",
institution = "Arxiv",

}

RIS

TY - UNPB

T1 - A New Aligned Simple German Corpus

AU - Toborek, Vanessa

AU - Busch, Moritz

AU - Boßert, Malte

AU - Bauckhage, Christian

AU - Welke, Pascal

N1 - Accepted at ACL 2023

PY - 2022/9/2

Y1 - 2022/9/2

N2 - "Leichte Sprache", the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German -- German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.

AB - "Leichte Sprache", the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German -- German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.

KW - cs.CL

M3 - Preprint

BT - A New Aligned Simple German Corpus

PB - Arxiv

ER -