Final published version, 246 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - The Nakba Lexicon
T2 - Building a Comprehensive Dataset from Palestinian Literature
AU - AbuHaija, Izza
AU - Al Mandhari, Salim
AU - El-Haj, Mo
AU - Sibony, Jonas
AU - Rayson, Paul
PY - 2025/1/20
Y1 - 2025/1/20
N2 - This paper introduces the Nakba Lexicon, a comprehensive dataset derived from the poetry collection Asifa ‘Ala al-Iz‘aj (Sorry for the Disturbance) by Istiqlal Eid, a Palestinian poet from El-Birweh. Eid’s work poignantly reflects on themes of Palestinian identity, displacement, and resilience, serving as a resource for preserving linguistic and cultural heritage in the context of post-Nakba literature. The dataset is structured into ten thematic domains, including political terminology, memory and preservation, sensory and emotional lexicon, toponyms, nature, and external linguistic influences such as Hebrew, French, and English, thereby capturing the socio-political, emotional, and cultural dimensions of the Nakba. The Nakba Lexicon uniquely emphasises the contributions of women to Palestinian literary traditions, shedding light on often-overlooked narratives of resilience and cultural continuity. Advanced Natural Language Processing (NLP) techniques were employed to analyse the dataset, with fine-tuned pre-trained models such as ARABERT and MARBERT achieving F1-scores of 0.87 and 0.68 in language and lexical classification tasks, respectively, significantly outperforming traditional machine learning models. These results highlight the potential of domain-specific computational models to effectively analyse complex datasets, facilitating the preservation of marginalised voices. By bridging computational methods with cultural preservation, this study enhances the understanding of Palestinian linguistic heritage and contributes to broader efforts in documenting and analysing endangered narratives. The Nakba Lexicon paves the way for future interdisciplinary research, showcasing the role of NLP in addressing historical trauma, resilience, and cultural identity.
AB - This paper introduces the Nakba Lexicon, a comprehensive dataset derived from the poetry collection Asifa ‘Ala al-Iz‘aj (Sorry for the Disturbance) by Istiqlal Eid, a Palestinian poet from El-Birweh. Eid’s work poignantly reflects on themes of Palestinian identity, displacement, and resilience, serving as a resource for preserving linguistic and cultural heritage in the context of post-Nakba literature. The dataset is structured into ten thematic domains, including political terminology, memory and preservation, sensory and emotional lexicon, toponyms, nature, and external linguistic influences such as Hebrew, French, and English, thereby capturing the socio-political, emotional, and cultural dimensions of the Nakba. The Nakba Lexicon uniquely emphasises the contributions of women to Palestinian literary traditions, shedding light on often-overlooked narratives of resilience and cultural continuity. Advanced Natural Language Processing (NLP) techniques were employed to analyse the dataset, with fine-tuned pre-trained models such as ARABERT and MARBERT achieving F1-scores of 0.87 and 0.68 in language and lexical classification tasks, respectively, significantly outperforming traditional machine learning models. These results highlight the potential of domain-specific computational models to effectively analyse complex datasets, facilitating the preservation of marginalised voices. By bridging computational methods with cultural preservation, this study enhances the understanding of Palestinian linguistic heritage and contributes to broader efforts in documenting and analysing endangered narratives. The Nakba Lexicon paves the way for future interdisciplinary research, showcasing the role of NLP in addressing historical trauma, resilience, and cultural identity.
M3 - Conference contribution/Paper
SP - 37
EP - 47
BT - Proceedings of the first International Workshop on Nakba Narratives as Language Resources
A2 - Jarrar, Mustafa
A2 - Habash, Habash
A2 - El-Haj, Mo
PB - Association for Computational Linguistics
CY - Abu Dhabi
ER -