Home > Research > Publications & Outputs > An Extensible Massively Multilingual Lexical Si...

Electronic data

  • 2024.readi-1.4

    Final published version, 195 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

View graph of relations

An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
  • Matthew Shardlow
  • Fernando Alva-Manchego
  • Riza Theresa Batista-Navarro
  • Stefan Bott
  • Saul Calderon Ramirez
  • Rémi Cardon
  • Thomas François
  • Akio Hayakawa
  • Andrea Horbach
  • Anna Hülsing
  • Yusuke Ide
  • Joseph Marvin Imperia
  • Adam Nohej
  • Kai North
  • Laura Occhipinti
  • Nelson Peréz Rojas
  • Md Nishat Raihan
  • Martin Solis Salazar
  • Marcos Zampieri
  • Horacio Saggion
Close
Publication date20/05/2024
Host publicationProceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024
EditorsRodrigo Wilkens, Rémi Cardon, Amalia Todirascu, Núria Gala
PublisherELRA and ICCL
Pages38-46
Number of pages9
ISBN (electronic)9782493814340
<mark>Original language</mark>English
Event3rd Workshop on Tools and Resources for REAding DIfficulties - Turin, Italy
Duration: 20/05/202420/05/2024
https://cental.uclouvain.be/readi2024/

Workshop

Workshop3rd Workshop on Tools and Resources for REAding DIfficulties
Abbreviated titleREADI
Country/TerritoryItaly
CityTurin
Period20/05/2420/05/24
Internet address

Workshop

Workshop3rd Workshop on Tools and Resources for REAding DIfficulties
Abbreviated titleREADI
Country/TerritoryItaly
CityTurin
Period20/05/2420/05/24
Internet address

Abstract

We present preliminary findings on the MultiLS dataset, developed in support of the 2024 Multilingual Lexical Simplification Pipeline (MLSP) Shared Task. This dataset currently comprises of 300 instances of lexical complexity prediction and lexical simplification across 10 languages. In this paper, we (1) describe the annotation protocol in support of the contribution of future datasets and (2) present summary statistics on the existing data that we have gathered. Multilingual lexical simplification can be used to support low-ability readers to engage with otherwise
difficult texts in their native, often low-resourced, languages.