Home > Research > Publications & Outputs > Guided Distant Supervision for Multilingual Rel...

Electronic data

  • 2024.lrec-main.703

    Final published version, 323 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

View graph of relations

Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
Close
Publication date20/05/2024
Host publicationProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
EditorsNicoletta Calzolari, Min-Yen Kan, Veronique Haste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
PublisherELRA and ICCL
Pages7982-7992
Number of pages11
ISBN (electronic)9782493814104
<mark>Original language</mark>English
Event The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation - Torino, Italy
Duration: 20/05/202425/05/2024
https://lrec-coling-2024.org/

Conference

Conference The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Abbreviated titleLREC-COLING 2024
Country/TerritoryItaly
CityTorino
Period20/05/2425/05/24
Internet address

Conference

Conference The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Abbreviated titleLREC-COLING 2024
Country/TerritoryItaly
CityTorino
Period20/05/2425/05/24
Internet address

Abstract

Relation extraction is essential for extracting and understanding biographical information in the context of digital humanities and related subjects. There is a growing interest in the community to build datasets capable of training machine learning models to extract relationships. However, annotating such datasets can be expensive and time-consuming, in addition to being limited to English. This paper applies guided distant supervision to create a large biographical relationship extraction dataset for German. Our dataset, composed of more than 80,000 instances for nine relationship types, is the largest biographical German relationship extraction dataset. We also create a manually annotated dataset with 2000 instances to evaluate the models and release it together with the dataset compiled using guided distant supervision. We train several state-of-the-art machine learning models on the automatically created dataset and release them as well. Furthermore, we experiment with multilingual and cross-lingual zero-shot experiments that could benefit many low-resource languages.