Home > Research > Publications & Outputs > The Nakba Lexicon

Electronic data

  • 2025.nakbanlp-1.5

    Final published version, 246 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

View graph of relations

The Nakba Lexicon: Building a Comprehensive Dataset from Palestinian Literature

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

The Nakba Lexicon: Building a Comprehensive Dataset from Palestinian Literature. / AbuHaija, Izza; Al Mandhari, Salim; El-Haj, Mo et al.
Proceedings of the first International Workshop on Nakba Narratives as Language Resources. ed. / Mustafa Jarrar; Habash Habash; Mo El-Haj. Abu Dhabi: Association for Computational Linguistics, 2025. p. 37-47.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

AbuHaija, I, Al Mandhari, S, El-Haj, M, Sibony, J & Rayson, P 2025, The Nakba Lexicon: Building a Comprehensive Dataset from Palestinian Literature. in M Jarrar, H Habash & M El-Haj (eds), Proceedings of the first International Workshop on Nakba Narratives as Language Resources. Association for Computational Linguistics, Abu Dhabi, pp. 37-47. <https://aclanthology.org/2025.nakbanlp-1.5/>

APA

AbuHaija, I., Al Mandhari, S., El-Haj, M., Sibony, J., & Rayson, P. (2025). The Nakba Lexicon: Building a Comprehensive Dataset from Palestinian Literature. In M. Jarrar, H. Habash, & M. El-Haj (Eds.), Proceedings of the first International Workshop on Nakba Narratives as Language Resources (pp. 37-47). Association for Computational Linguistics. https://aclanthology.org/2025.nakbanlp-1.5/

Vancouver

AbuHaija I, Al Mandhari S, El-Haj M, Sibony J, Rayson P. The Nakba Lexicon: Building a Comprehensive Dataset from Palestinian Literature. In Jarrar M, Habash H, El-Haj M, editors, Proceedings of the first International Workshop on Nakba Narratives as Language Resources. Abu Dhabi: Association for Computational Linguistics. 2025. p. 37-47

Author

AbuHaija, Izza ; Al Mandhari, Salim ; El-Haj, Mo et al. / The Nakba Lexicon : Building a Comprehensive Dataset from Palestinian Literature. Proceedings of the first International Workshop on Nakba Narratives as Language Resources. editor / Mustafa Jarrar ; Habash Habash ; Mo El-Haj. Abu Dhabi : Association for Computational Linguistics, 2025. pp. 37-47

Bibtex

@inproceedings{7b10b81526164359b317a8c58bf39319,
title = "The Nakba Lexicon: Building a Comprehensive Dataset from Palestinian Literature",
abstract = "This paper introduces the Nakba Lexicon, a comprehensive dataset derived from the poetry collection Asifa {\textquoteleft}Ala al-Iz{\textquoteleft}aj (Sorry for the Disturbance) by Istiqlal Eid, a Palestinian poet from El-Birweh. Eid{\textquoteright}s work poignantly reflects on themes of Palestinian identity, displacement, and resilience, serving as a resource for preserving linguistic and cultural heritage in the context of post-Nakba literature. The dataset is structured into ten thematic domains, including political terminology, memory and preservation, sensory and emotional lexicon, toponyms, nature, and external linguistic influences such as Hebrew, French, and English, thereby capturing the socio-political, emotional, and cultural dimensions of the Nakba. The Nakba Lexicon uniquely emphasises the contributions of women to Palestinian literary traditions, shedding light on often-overlooked narratives of resilience and cultural continuity. Advanced Natural Language Processing (NLP) techniques were employed to analyse the dataset, with fine-tuned pre-trained models such as ARABERT and MARBERT achieving F1-scores of 0.87 and 0.68 in language and lexical classification tasks, respectively, significantly outperforming traditional machine learning models. These results highlight the potential of domain-specific computational models to effectively analyse complex datasets, facilitating the preservation of marginalised voices. By bridging computational methods with cultural preservation, this study enhances the understanding of Palestinian linguistic heritage and contributes to broader efforts in documenting and analysing endangered narratives. The Nakba Lexicon paves the way for future interdisciplinary research, showcasing the role of NLP in addressing historical trauma, resilience, and cultural identity.",
author = "Izza AbuHaija and {Al Mandhari}, Salim and Mo El-Haj and Jonas Sibony and Paul Rayson",
year = "2025",
month = jan,
day = "20",
language = "English",
pages = "37--47",
editor = "Mustafa Jarrar and Habash Habash and Mo El-Haj",
booktitle = "Proceedings of the first International Workshop on Nakba Narratives as Language Resources",
publisher = "Association for Computational Linguistics",

}

RIS

TY - GEN

T1 - The Nakba Lexicon

T2 - Building a Comprehensive Dataset from Palestinian Literature

AU - AbuHaija, Izza

AU - Al Mandhari, Salim

AU - El-Haj, Mo

AU - Sibony, Jonas

AU - Rayson, Paul

PY - 2025/1/20

Y1 - 2025/1/20

N2 - This paper introduces the Nakba Lexicon, a comprehensive dataset derived from the poetry collection Asifa ‘Ala al-Iz‘aj (Sorry for the Disturbance) by Istiqlal Eid, a Palestinian poet from El-Birweh. Eid’s work poignantly reflects on themes of Palestinian identity, displacement, and resilience, serving as a resource for preserving linguistic and cultural heritage in the context of post-Nakba literature. The dataset is structured into ten thematic domains, including political terminology, memory and preservation, sensory and emotional lexicon, toponyms, nature, and external linguistic influences such as Hebrew, French, and English, thereby capturing the socio-political, emotional, and cultural dimensions of the Nakba. The Nakba Lexicon uniquely emphasises the contributions of women to Palestinian literary traditions, shedding light on often-overlooked narratives of resilience and cultural continuity. Advanced Natural Language Processing (NLP) techniques were employed to analyse the dataset, with fine-tuned pre-trained models such as ARABERT and MARBERT achieving F1-scores of 0.87 and 0.68 in language and lexical classification tasks, respectively, significantly outperforming traditional machine learning models. These results highlight the potential of domain-specific computational models to effectively analyse complex datasets, facilitating the preservation of marginalised voices. By bridging computational methods with cultural preservation, this study enhances the understanding of Palestinian linguistic heritage and contributes to broader efforts in documenting and analysing endangered narratives. The Nakba Lexicon paves the way for future interdisciplinary research, showcasing the role of NLP in addressing historical trauma, resilience, and cultural identity.

AB - This paper introduces the Nakba Lexicon, a comprehensive dataset derived from the poetry collection Asifa ‘Ala al-Iz‘aj (Sorry for the Disturbance) by Istiqlal Eid, a Palestinian poet from El-Birweh. Eid’s work poignantly reflects on themes of Palestinian identity, displacement, and resilience, serving as a resource for preserving linguistic and cultural heritage in the context of post-Nakba literature. The dataset is structured into ten thematic domains, including political terminology, memory and preservation, sensory and emotional lexicon, toponyms, nature, and external linguistic influences such as Hebrew, French, and English, thereby capturing the socio-political, emotional, and cultural dimensions of the Nakba. The Nakba Lexicon uniquely emphasises the contributions of women to Palestinian literary traditions, shedding light on often-overlooked narratives of resilience and cultural continuity. Advanced Natural Language Processing (NLP) techniques were employed to analyse the dataset, with fine-tuned pre-trained models such as ARABERT and MARBERT achieving F1-scores of 0.87 and 0.68 in language and lexical classification tasks, respectively, significantly outperforming traditional machine learning models. These results highlight the potential of domain-specific computational models to effectively analyse complex datasets, facilitating the preservation of marginalised voices. By bridging computational methods with cultural preservation, this study enhances the understanding of Palestinian linguistic heritage and contributes to broader efforts in documenting and analysing endangered narratives. The Nakba Lexicon paves the way for future interdisciplinary research, showcasing the role of NLP in addressing historical trauma, resilience, and cultural identity.

M3 - Conference contribution/Paper

SP - 37

EP - 47

BT - Proceedings of the first International Workshop on Nakba Narratives as Language Resources

A2 - Jarrar, Mustafa

A2 - Habash, Habash

A2 - El-Haj, Mo

PB - Association for Computational Linguistics

CY - Abu Dhabi

ER -