Home > Research > Publications & Outputs > Classifying Information Sources in Arabic Twitt...

Links

View graph of relations

Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paper

Published
Publication date22/07/2019
Host publicationProceedings of the 3rd Workshop on Arabic Corpus Linguistics: (WACL-3)
PublisherAssociation for Computational Linguistics
Number of pages9
Original languageEnglish
EventThe 3rd Workshop on Arabic Corpus Linguistics: Held at the Corpus Linguistics 2019 Conference - Cardiff University, Cardiff, United Kingdom
Duration: 22/07/201922/07/2019
http://wp.lancs.ac.uk/wacl3/

Workshop

WorkshopThe 3rd Workshop on Arabic Corpus Linguistics
Abbreviated titleWACL-3
CountryUnited Kingdom
CityCardiff
Period22/07/1922/07/19
Internet address

Workshop

WorkshopThe 3rd Workshop on Arabic Corpus Linguistics
Abbreviated titleWACL-3
CountryUnited Kingdom
CityCardiff
Period22/07/1922/07/19
Internet address

Abstract

There is vast untapped potential in relation to the use of social media for monitoring the spread of infectious diseases around the world. Much previous research has focussed on English only, but the Arabic twitter universe has been comparatively much less studied. Motivated by important issues related to levels of trust, quality and reliability of the information online, here we consider the variety of information sources. As a first step, we find that numerous accounts disseminate information via Arabic social media, and we group them into five types of sources: academic, media, government, health professional, and public. We perform two experiments. First, native speakers judge whether they can manually classify tweets into these five groups, and then we repeat the experiment using various Machine Learning (ML) classifiers. We find that inter-annotator agreement is 0.84 for this task, and ML classifiers are able to
correctly identify the type of source of a tweet with 77.2% accuracy without knowledge of the user and their bio or profile, but with 99.9% accuracy when provided with this information.