Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases

Associated organisational units

Keywords

Arabic, Infectious diseases, Machine Learning, Natural Language Processing, Twitter

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases. / Alsudias, Lama ; Rayson, Paul.
Proceedings of the 3rd Workshop on Arabic Corpus Linguistics: (WACL-3). Association for Computational Linguistics, 2019.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Alsudias, L & Rayson, P 2019, Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases. in Proceedings of the 3rd Workshop on Arabic Corpus Linguistics: (WACL-3). Association for Computational Linguistics, The 3rd Workshop on Arabic Corpus Linguistics, Cardiff, United Kingdom, 22/07/19. <https://www.aclweb.org/anthology/W19-5604>

APA

Alsudias, L., & Rayson, P. (2019). Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases. In Proceedings of the 3rd Workshop on Arabic Corpus Linguistics: (WACL-3) Association for Computational Linguistics. https://www.aclweb.org/anthology/W19-5604

Vancouver

Alsudias L , Rayson P. Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases. In Proceedings of the 3rd Workshop on Arabic Corpus Linguistics: (WACL-3). Association for Computational Linguistics. 2019

Author

Alsudias, Lama ; Rayson, Paul. / Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases. Proceedings of the 3rd Workshop on Arabic Corpus Linguistics: (WACL-3). Association for Computational Linguistics, 2019.

Bibtex

@inproceedings{57f40bab39e44af5968a07b2b4f6dfb3,

title = "Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases",

abstract = "There is vast untapped potential in relation to the use of social media for monitoring the spread of infectious diseases around the world. Much previous research has focussed on English only, but the Arabic twitter universe has been comparatively much less studied. Motivated by important issues related to levels of trust, quality and reliability of the information online, here we consider the variety of information sources. As a first step, we find that numerous accounts disseminate information via Arabic social media, and we group them into five types of sources: academic, media, government, health professional, and public. We perform two experiments. First, native speakers judge whether they can manually classify tweets into these five groups, and then we repeat the experiment using various Machine Learning (ML) classifiers. We find that inter-annotator agreement is 0.84 for this task, and ML classifiers are able tocorrectly identify the type of source of a tweet with 77.2% accuracy without knowledge of the user and their bio or profile, but with 99.9% accuracy when provided with this information.",

keywords = "Arabic, Infectious diseases, Machine Learning, Natural Language Processing, Twitter",

author = "Lama Alsudias and Paul Rayson",

year = "2019",

month = jul,

day = "22",

language = "English",

booktitle = "Proceedings of the 3rd Workshop on Arabic Corpus Linguistics",

publisher = "Association for Computational Linguistics",

note = "The 3rd Workshop on Arabic Corpus Linguistics : Held at the Corpus Linguistics 2019 Conference, WACL-3 ; Conference date: 22-07-2019 Through 22-07-2019",

url = "http://wp.lancs.ac.uk/wacl3/",

}

RIS

TY - GEN

T1 - Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases

AU - Alsudias, Lama

AU - Rayson, Paul

PY - 2019/7/22

Y1 - 2019/7/22

N2 - There is vast untapped potential in relation to the use of social media for monitoring the spread of infectious diseases around the world. Much previous research has focussed on English only, but the Arabic twitter universe has been comparatively much less studied. Motivated by important issues related to levels of trust, quality and reliability of the information online, here we consider the variety of information sources. As a first step, we find that numerous accounts disseminate information via Arabic social media, and we group them into five types of sources: academic, media, government, health professional, and public. We perform two experiments. First, native speakers judge whether they can manually classify tweets into these five groups, and then we repeat the experiment using various Machine Learning (ML) classifiers. We find that inter-annotator agreement is 0.84 for this task, and ML classifiers are able tocorrectly identify the type of source of a tweet with 77.2% accuracy without knowledge of the user and their bio or profile, but with 99.9% accuracy when provided with this information.

AB - There is vast untapped potential in relation to the use of social media for monitoring the spread of infectious diseases around the world. Much previous research has focussed on English only, but the Arabic twitter universe has been comparatively much less studied. Motivated by important issues related to levels of trust, quality and reliability of the information online, here we consider the variety of information sources. As a first step, we find that numerous accounts disseminate information via Arabic social media, and we group them into five types of sources: academic, media, government, health professional, and public. We perform two experiments. First, native speakers judge whether they can manually classify tweets into these five groups, and then we repeat the experiment using various Machine Learning (ML) classifiers. We find that inter-annotator agreement is 0.84 for this task, and ML classifiers are able tocorrectly identify the type of source of a tweet with 77.2% accuracy without knowledge of the user and their bio or profile, but with 99.9% accuracy when provided with this information.

KW - Arabic

KW - Infectious diseases

KW - Machine Learning

KW - Natural Language Processing

KW - Twitter

M3 - Conference contribution/Paper

BT - Proceedings of the 3rd Workshop on Arabic Corpus Linguistics

PB - Association for Computational Linguistics

T2 - The 3rd Workshop on Arabic Corpus Linguistics

Y2 - 22 July 2019 through 22 July 2019

ER -

Research

Associated organisational units

Links

Keywords