Home > Research > Publications & Outputs > Using Arabic Twitter to support analysis of the...

Electronic data

  • 2022LamaPhD

    Final published version, 4.1 MB, PDF document

    Available under license: CC BY-NC-SA: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Text available via DOI:

View graph of relations

Using Arabic Twitter to support analysis of the spread of Infectious Diseases

Research output: ThesisDoctoral Thesis

Published
Publication date2022
QualificationPhD
Awarding Institution
Supervisors/Advisors
Thesis sponsors
  • King saud university
Award date9/03/2022
Publisher
  • Lancaster University
<mark>Original language</mark>English

Abstract

This study investigates how to use Arabic social media content, especially Twitter, to measure the incidence of infectious diseases. People use social media applications such as Twitter to find news related to diseases and/or express their opinions and feelings about them. As a result, a vast amount of information could be exploited by NLP researchers for a myriad of analyses despite the informal nature of social media writing style. Systematic monitoring of social media posts (infodemiology or infoveillance) could be useful to detect misinformation outbreaks as well as to reduce reporting lag time and to provide an independent complementary source of data compared with traditional surveillance approaches. However, there has been a lack of research about
analysing Arabic tweets for health surveillance purposes, due to the lack of Arabic social media datasets in comparison with what is available for English and some other languages. Therefore, it is necessary for us to create our own corpus. In addition, building ontologies is a crucial part of the semantic web endeavour. In recent years, research interest has grown rapidly in supporting languages such as Arabic in NLP in general but there has been very little research on medical ontologies for Arabic.

In this thesis, the first and the largest Arabic Twitter dataset in the area of health
surveillance was created to use in training and testing in the research studies presented. The Machine Learning algorithms with NLP techniques especially for Arabic were used to classify tweets into five categories: academic, media, government, health professional, and the public, to assist in reliability and trust judgements by taking into account the source of the information alongside the content of tweets. An Arabic Infectious Diseases Ontology was presented and evaluated as part of a new method to bridge between formal and informal descriptions of Infectious Diseases. Different qualitative and quantitative studies were performed to analyse Arabic tweets that have been written during the pandemic, i.e. COVID-19, to show how Public Health Organisations can learn from social media. A system was presented that measures the spread of two infectious diseases based on our Ontology to illustrate what quantitative
patterns and qualitative themes can be extracted.