Home > Research > Publications & Outputs > ALEXSIS-PT

Electronic data

  • 2022.coling-1.529

    Final published version, 215 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

View graph of relations

ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
Close
Publication date12/10/2022
Host publicationProceedings of the 29th International Conference on Computational Linguistics
PublisherInternational Committee on Computational Linguistics
Pages6057-6062
Number of pages6
<mark>Original language</mark>English
EventThe 29th International Conference on Computational Linguistics - Gyeongju, Korea, Republic of
Duration: 12/10/202217/10/2022
Conference number: 29
https://coling2022.org/

Conference

ConferenceThe 29th International Conference on Computational Linguistics
Abbreviated titleCOLING2022
Country/TerritoryKorea, Republic of
CityGyeongju
Period12/10/2217/10/22
Internet address

Publication series

NameCOLING Proceedings
PublisherInternational Committee on Computational Linguistics
Volume2022
ISSN (electronic)2951-2093

Conference

ConferenceThe 29th International Conference on Computational Linguistics
Abbreviated titleCOLING2022
Country/TerritoryKorea, Republic of
CityGyeongju
Period12/10/2217/10/22
Internet address

Abstract

Lexical simplification (LS) is the task of automatically replacing complex words for easier ones making texts more accessible to various target populations (e.g. individuals with low literacy, individuals with learning disabilities, second language learners). To train and test models, LS systems usually require corpora that feature complex words in context along with their candidate substitutions.
To continue improving the performance of LS systems we introduce ALEXSISPT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9,605
candidate substitutions for 387 complex words. ALEXSIS-PT has been compiled
following the ALEXSIS protocol for Spanish opening exciting new avenues for crosslingual models. ALEXSIS-PT is the first LS multi-candidate dataset that contains Brazilian newspaper articles. We evaluated four models for substitute generation on this dataset, namely mDistilBERT, mBERT, XLM-R, and
BERTimbau. BERTimbau achieved the highest performance across all evaluation metrics.