Home > Research > Publications & Outputs > Guided Distant Supervision for Multilingual Rel...

Electronic data

  • 2024.lrec-main.703

    Final published version, 323 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

View graph of relations

Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language. / Plum, Alistair; Ranasinghe, Tharindu; Purschke, Christoph.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ed. / Nicoletta Calzolari; Min-Yen Kan; Veronique Haste; Alessandro Lenci; Sakriani Sakti; Nianwen Xue. ELRA and ICCL, 2024. p. 7982-7992.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Plum, A, Ranasinghe, T & Purschke, C 2024, Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language. in N Calzolari, M-Y Kan, V Haste, A Lenci, S Sakti & N Xue (eds), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL, pp. 7982-7992, The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Torino, Italy, 20/05/24. <https://aclanthology.org/2024.lrec-main.703>

APA

Plum, A., Ranasinghe, T., & Purschke, C. (2024). Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language. In N. Calzolari, M.-Y. Kan, V. Haste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 7982-7992). ELRA and ICCL. https://aclanthology.org/2024.lrec-main.703

Vancouver

Plum A, Ranasinghe T, Purschke C. Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language. In Calzolari N, Kan MY, Haste V, Lenci A, Sakti S, Xue N, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL. 2024. p. 7982-7992

Author

Plum, Alistair ; Ranasinghe, Tharindu ; Purschke, Christoph. / Guided Distant Supervision for Multilingual Relation Extraction Data : Adapting to a New Language. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). editor / Nicoletta Calzolari ; Min-Yen Kan ; Veronique Haste ; Alessandro Lenci ; Sakriani Sakti ; Nianwen Xue. ELRA and ICCL, 2024. pp. 7982-7992

Bibtex

@inproceedings{fd97a27a160a48d28ed840f04bb8e6f7,
title = "Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language",
abstract = "Relation extraction is essential for extracting and understanding biographical information in the context of digital humanities and related subjects. There is a growing interest in the community to build datasets capable of training machine learning models to extract relationships. However, annotating such datasets can be expensive and time-consuming, in addition to being limited to English. This paper applies guided distant supervision to create a large biographical relationship extraction dataset for German. Our dataset, composed of more than 80,000 instances for nine relationship types, is the largest biographical German relationship extraction dataset. We also create a manually annotated dataset with 2000 instances to evaluate the models and release it together with the dataset compiled using guided distant supervision. We train several state-of-the-art machine learning models on the automatically created dataset and release them as well. Furthermore, we experiment with multilingual and cross-lingual zero-shot experiments that could benefit many low-resource languages.",
author = "Alistair Plum and Tharindu Ranasinghe and Christoph Purschke",
year = "2024",
month = may,
day = "20",
language = "English",
pages = "7982--7992",
editor = "Nicoletta Calzolari and Min-Yen Kan and Veronique Haste and Alessandro Lenci and Sakriani Sakti and Nianwen Xue",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
publisher = "ELRA and ICCL",
note = " The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 ; Conference date: 20-05-2024 Through 25-05-2024",
url = "https://lrec-coling-2024.org/",

}

RIS

TY - GEN

T1 - Guided Distant Supervision for Multilingual Relation Extraction Data

T2 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation

AU - Plum, Alistair

AU - Ranasinghe, Tharindu

AU - Purschke, Christoph

PY - 2024/5/20

Y1 - 2024/5/20

N2 - Relation extraction is essential for extracting and understanding biographical information in the context of digital humanities and related subjects. There is a growing interest in the community to build datasets capable of training machine learning models to extract relationships. However, annotating such datasets can be expensive and time-consuming, in addition to being limited to English. This paper applies guided distant supervision to create a large biographical relationship extraction dataset for German. Our dataset, composed of more than 80,000 instances for nine relationship types, is the largest biographical German relationship extraction dataset. We also create a manually annotated dataset with 2000 instances to evaluate the models and release it together with the dataset compiled using guided distant supervision. We train several state-of-the-art machine learning models on the automatically created dataset and release them as well. Furthermore, we experiment with multilingual and cross-lingual zero-shot experiments that could benefit many low-resource languages.

AB - Relation extraction is essential for extracting and understanding biographical information in the context of digital humanities and related subjects. There is a growing interest in the community to build datasets capable of training machine learning models to extract relationships. However, annotating such datasets can be expensive and time-consuming, in addition to being limited to English. This paper applies guided distant supervision to create a large biographical relationship extraction dataset for German. Our dataset, composed of more than 80,000 instances for nine relationship types, is the largest biographical German relationship extraction dataset. We also create a manually annotated dataset with 2000 instances to evaluate the models and release it together with the dataset compiled using guided distant supervision. We train several state-of-the-art machine learning models on the automatically created dataset and release them as well. Furthermore, we experiment with multilingual and cross-lingual zero-shot experiments that could benefit many low-resource languages.

M3 - Conference contribution/Paper

SP - 7982

EP - 7992

BT - Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

A2 - Calzolari, Nicoletta

A2 - Kan, Min-Yen

A2 - Haste, Veronique

A2 - Lenci, Alessandro

A2 - Sakti, Sakriani

A2 - Xue, Nianwen

PB - ELRA and ICCL

Y2 - 20 May 2024 through 25 May 2024

ER -