Home > Research > Publications & Outputs > DORE

Electronic data

  • 2024.lrec-main.473

    Final published version, 248 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

View graph of relations

DORE: A Dataset for Portuguese Definition Generation

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

DORE: A Dataset for Portuguese Definition Generation. / Furtado, Anna Beatriz Dimas; Ranasinghe, Tharindu; Blain, Frederic et al.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ed. / Nicoletta Calzolari; Min-Yen Kan; Veronique Hoste; Alessandro Lenci; Sakriani Sakti; Nianwen Xue. ELRA and ICCL, 2024. p. 5315-5322.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Furtado, ABD, Ranasinghe, T, Blain, F & Mitkov, R 2024, DORE: A Dataset for Portuguese Definition Generation. in N Calzolari, M-Y Kan, V Hoste, A Lenci, S Sakti & N Xue (eds), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL, pp. 5315-5322, The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Torino, Italy, 20/05/24. <https://aclanthology.org/2024.lrec-main.473/>

APA

Furtado, A. B. D., Ranasinghe, T., Blain, F., & Mitkov, R. (2024). DORE: A Dataset for Portuguese Definition Generation. In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 5315-5322). ELRA and ICCL. https://aclanthology.org/2024.lrec-main.473/

Vancouver

Furtado ABD, Ranasinghe T, Blain F, Mitkov R. DORE: A Dataset for Portuguese Definition Generation. In Calzolari N, Kan MY, Hoste V, Lenci A, Sakti S, Xue N, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL. 2024. p. 5315-5322

Author

Furtado, Anna Beatriz Dimas ; Ranasinghe, Tharindu ; Blain, Frederic et al. / DORE : A Dataset for Portuguese Definition Generation. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). editor / Nicoletta Calzolari ; Min-Yen Kan ; Veronique Hoste ; Alessandro Lenci ; Sakriani Sakti ; Nianwen Xue. ELRA and ICCL, 2024. pp. 5315-5322

Bibtex

@inproceedings{8d0a788ecabc4986b0f7f90acc81f64e,
title = "DORE: A Dataset for Portuguese Definition Generation",
abstract = "Definition modelling (DM) is the task of automatically generating a dictionary definition of a specific word. Computational systems that are capable of DM can have numerous applications benefiting a wide range of audiences. As DM is considered a supervised natural language generation problem, these systems require large annotated datasets to train the machine learning (ML) models. Several DM datasets have been released for English and other high-resource languages. While Portuguese is considered a mid/high-resource language in most natural language processing tasks and is spoken by more than 200 million native speakers, there is no DM dataset available for Portuguese. In this research, we fill this gap by introducing DORE; the first dataset for Definition MOdelling for PoRtuguEse containing more than 100,000 definitions. We also evaluate several deep learning based DM models on DORE and report the results. The dataset and the findings of this paper will facilitate research and study of Portuguese in wider contexts.",
author = "Furtado, {Anna Beatriz Dimas} and Tharindu Ranasinghe and Frederic Blain and Ruslan Mitkov",
year = "2024",
month = may,
day = "20",
language = "English",
isbn = "9782493814104",
pages = "5315--5322",
editor = "Nicoletta Calzolari and Min-Yen Kan and Veronique Hoste and Alessandro Lenci and Sakriani Sakti and Nianwen Xue",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
publisher = "ELRA and ICCL",
note = " The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 ; Conference date: 20-05-2024 Through 25-05-2024",
url = "https://lrec-coling-2024.org/",

}

RIS

TY - GEN

T1 - DORE

T2 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation

AU - Furtado, Anna Beatriz Dimas

AU - Ranasinghe, Tharindu

AU - Blain, Frederic

AU - Mitkov, Ruslan

PY - 2024/5/20

Y1 - 2024/5/20

N2 - Definition modelling (DM) is the task of automatically generating a dictionary definition of a specific word. Computational systems that are capable of DM can have numerous applications benefiting a wide range of audiences. As DM is considered a supervised natural language generation problem, these systems require large annotated datasets to train the machine learning (ML) models. Several DM datasets have been released for English and other high-resource languages. While Portuguese is considered a mid/high-resource language in most natural language processing tasks and is spoken by more than 200 million native speakers, there is no DM dataset available for Portuguese. In this research, we fill this gap by introducing DORE; the first dataset for Definition MOdelling for PoRtuguEse containing more than 100,000 definitions. We also evaluate several deep learning based DM models on DORE and report the results. The dataset and the findings of this paper will facilitate research and study of Portuguese in wider contexts.

AB - Definition modelling (DM) is the task of automatically generating a dictionary definition of a specific word. Computational systems that are capable of DM can have numerous applications benefiting a wide range of audiences. As DM is considered a supervised natural language generation problem, these systems require large annotated datasets to train the machine learning (ML) models. Several DM datasets have been released for English and other high-resource languages. While Portuguese is considered a mid/high-resource language in most natural language processing tasks and is spoken by more than 200 million native speakers, there is no DM dataset available for Portuguese. In this research, we fill this gap by introducing DORE; the first dataset for Definition MOdelling for PoRtuguEse containing more than 100,000 definitions. We also evaluate several deep learning based DM models on DORE and report the results. The dataset and the findings of this paper will facilitate research and study of Portuguese in wider contexts.

M3 - Conference contribution/Paper

SN - 9782493814104

SP - 5315

EP - 5322

BT - Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

A2 - Calzolari, Nicoletta

A2 - Kan, Min-Yen

A2 - Hoste, Veronique

A2 - Lenci, Alessandro

A2 - Sakti, Sakriani

A2 - Xue, Nianwen

PB - ELRA and ICCL

Y2 - 20 May 2024 through 25 May 2024

ER -