Home > Research > Publications & Outputs > ALEXSIS-PT

Electronic data

  • 2022.coling-1.529

    Final published version, 215 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

View graph of relations

ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification. / North, Kai; Zampieri, Marcos; Ranasinghe, Tharindu.
Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 2022. p. 6057-6062 (COLING Proceedings; Vol. 2022).

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

North, K, Zampieri, M & Ranasinghe, T 2022, ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification. in Proceedings of the 29th International Conference on Computational Linguistics. COLING Proceedings, vol. 2022, International Committee on Computational Linguistics, pp. 6057-6062, The 29th International Conference on Computational Linguistics, Gyeongju, Korea, Republic of, 12/10/22. <https://aclanthology.org/2022.coling-1.529/>

APA

North, K., Zampieri, M., & Ranasinghe, T. (2022). ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 6057-6062). (COLING Proceedings; Vol. 2022). International Committee on Computational Linguistics. https://aclanthology.org/2022.coling-1.529/

Vancouver

North K, Zampieri M, Ranasinghe T. ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics. 2022. p. 6057-6062. (COLING Proceedings).

Author

North, Kai ; Zampieri, Marcos ; Ranasinghe, Tharindu. / ALEXSIS-PT : A New Resource for Portuguese Lexical Simplification. Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 2022. pp. 6057-6062 (COLING Proceedings).

Bibtex

@inproceedings{7948ef0da43c4bbda40422944b186c02,
title = "ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification",
abstract = "Lexical simplification (LS) is the task of automatically replacing complex words for easier ones making texts more accessible to various target populations (e.g. individuals with low literacy, individuals with learning disabilities, second language learners). To train and test models, LS systems usually require corpora that feature complex words in context along with their candidate substitutions.To continue improving the performance of LS systems we introduce ALEXSISPT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9,605candidate substitutions for 387 complex words. ALEXSIS-PT has been compiledfollowing the ALEXSIS protocol for Spanish opening exciting new avenues for crosslingual models. ALEXSIS-PT is the first LS multi-candidate dataset that contains Brazilian newspaper articles. We evaluated four models for substitute generation on this dataset, namely mDistilBERT, mBERT, XLM-R, andBERTimbau. BERTimbau achieved the highest performance across all evaluation metrics.",
author = "Kai North and Marcos Zampieri and Tharindu Ranasinghe",
year = "2022",
month = oct,
day = "12",
language = "English",
series = "COLING Proceedings",
publisher = "International Committee on Computational Linguistics",
pages = "6057--6062",
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
note = "The 29th International Conference on Computational Linguistics, COLING2022 ; Conference date: 12-10-2022 Through 17-10-2022",
url = "https://coling2022.org/",

}

RIS

TY - GEN

T1 - ALEXSIS-PT

T2 - The 29th International Conference on Computational Linguistics

AU - North, Kai

AU - Zampieri, Marcos

AU - Ranasinghe, Tharindu

N1 - Conference code: 29

PY - 2022/10/12

Y1 - 2022/10/12

N2 - Lexical simplification (LS) is the task of automatically replacing complex words for easier ones making texts more accessible to various target populations (e.g. individuals with low literacy, individuals with learning disabilities, second language learners). To train and test models, LS systems usually require corpora that feature complex words in context along with their candidate substitutions.To continue improving the performance of LS systems we introduce ALEXSISPT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9,605candidate substitutions for 387 complex words. ALEXSIS-PT has been compiledfollowing the ALEXSIS protocol for Spanish opening exciting new avenues for crosslingual models. ALEXSIS-PT is the first LS multi-candidate dataset that contains Brazilian newspaper articles. We evaluated four models for substitute generation on this dataset, namely mDistilBERT, mBERT, XLM-R, andBERTimbau. BERTimbau achieved the highest performance across all evaluation metrics.

AB - Lexical simplification (LS) is the task of automatically replacing complex words for easier ones making texts more accessible to various target populations (e.g. individuals with low literacy, individuals with learning disabilities, second language learners). To train and test models, LS systems usually require corpora that feature complex words in context along with their candidate substitutions.To continue improving the performance of LS systems we introduce ALEXSISPT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9,605candidate substitutions for 387 complex words. ALEXSIS-PT has been compiledfollowing the ALEXSIS protocol for Spanish opening exciting new avenues for crosslingual models. ALEXSIS-PT is the first LS multi-candidate dataset that contains Brazilian newspaper articles. We evaluated four models for substitute generation on this dataset, namely mDistilBERT, mBERT, XLM-R, andBERTimbau. BERTimbau achieved the highest performance across all evaluation metrics.

M3 - Conference contribution/Paper

T3 - COLING Proceedings

SP - 6057

EP - 6062

BT - Proceedings of the 29th International Conference on Computational Linguistics

PB - International Committee on Computational Linguistics

Y2 - 12 October 2022 through 17 October 2022

ER -