Home > Research > Publications & Outputs > Is it Offensive or Abusive?

Electronic data

View graph of relations

Is it Offensive or Abusive?: An Empirical Study of Hateful Language Detection of Arabic Social Media Texts?

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Is it Offensive or Abusive? An Empirical Study of Hateful Language Detection of Arabic Social Media Texts? / Al Mandhari, Salim; El-Haj, Mahmoud; Rayson, Paul.
The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS). ed. / Ruslan Mitkov. Lancaster : Lancaster University, 2024.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Al Mandhari, S, El-Haj, M & Rayson, P 2024, Is it Offensive or Abusive? An Empirical Study of Hateful Language Detection of Arabic Social Media Texts? in R Mitkov (ed.), The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS). Lancaster University, Lancaster .

APA

Al Mandhari, S., El-Haj, M., & Rayson, P. (2024). Is it Offensive or Abusive? An Empirical Study of Hateful Language Detection of Arabic Social Media Texts? In R. Mitkov (Ed.), The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS) Lancaster University.

Vancouver

Al Mandhari S, El-Haj M, Rayson P. Is it Offensive or Abusive? An Empirical Study of Hateful Language Detection of Arabic Social Media Texts? In Mitkov R, editor, The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS). Lancaster : Lancaster University. 2024

Author

Al Mandhari, Salim ; El-Haj, Mahmoud ; Rayson, Paul. / Is it Offensive or Abusive? An Empirical Study of Hateful Language Detection of Arabic Social Media Texts?. The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS). editor / Ruslan Mitkov. Lancaster : Lancaster University, 2024.

Bibtex

@inproceedings{8b2022507efc4e40afe541329fd589a2,
title = "Is it Offensive or Abusive?: An Empirical Study of Hateful Language Detection of Arabic Social Media Texts?",
abstract = "Among many potential subjects studied in Sentiment Analysis, widespread offensive and abusive language on social media has triggered interest in reducing its risks on users; children in particular. This paper centres on distinguishing between offensive and abusive language detection within Arabic social media texts through the employment of various machine and deep learning techniques. The techniques include Na{\"i}ve Bayes (NB), Support Vector Machine (SVM), fastText, keras, and RoBERTa XML multilingual embeddings, which have demonstrated superior performance compared to other statistical machine learning methods and different kinds of embeddings like fastText. The methods were implemented on two separate corpora from YouTube comments totalling 47K comments. The results demonstrated that all models, except NB, reached an accuracy of 82%. It was also shown that word tri-grams enhance classification performance, though other tuning techniques were applied such as TF-IDF and grid-search. The linguistic findings, aimed at distinguishing between offensive and abusive language, were consistent with machine learning (ML) performance, which effectively classified the two distinct classes of sentiment: offensive and abusive. ",
author = "{Al Mandhari}, Salim and Mahmoud El-Haj and Paul Rayson",
year = "2024",
month = jul,
day = "29",
language = "English",
editor = "Ruslan Mitkov",
booktitle = "The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS)",
publisher = "Lancaster University",

}

RIS

TY - GEN

T1 - Is it Offensive or Abusive?

T2 - An Empirical Study of Hateful Language Detection of Arabic Social Media Texts?

AU - Al Mandhari, Salim

AU - El-Haj, Mahmoud

AU - Rayson, Paul

PY - 2024/7/29

Y1 - 2024/7/29

N2 - Among many potential subjects studied in Sentiment Analysis, widespread offensive and abusive language on social media has triggered interest in reducing its risks on users; children in particular. This paper centres on distinguishing between offensive and abusive language detection within Arabic social media texts through the employment of various machine and deep learning techniques. The techniques include Naïve Bayes (NB), Support Vector Machine (SVM), fastText, keras, and RoBERTa XML multilingual embeddings, which have demonstrated superior performance compared to other statistical machine learning methods and different kinds of embeddings like fastText. The methods were implemented on two separate corpora from YouTube comments totalling 47K comments. The results demonstrated that all models, except NB, reached an accuracy of 82%. It was also shown that word tri-grams enhance classification performance, though other tuning techniques were applied such as TF-IDF and grid-search. The linguistic findings, aimed at distinguishing between offensive and abusive language, were consistent with machine learning (ML) performance, which effectively classified the two distinct classes of sentiment: offensive and abusive.

AB - Among many potential subjects studied in Sentiment Analysis, widespread offensive and abusive language on social media has triggered interest in reducing its risks on users; children in particular. This paper centres on distinguishing between offensive and abusive language detection within Arabic social media texts through the employment of various machine and deep learning techniques. The techniques include Naïve Bayes (NB), Support Vector Machine (SVM), fastText, keras, and RoBERTa XML multilingual embeddings, which have demonstrated superior performance compared to other statistical machine learning methods and different kinds of embeddings like fastText. The methods were implemented on two separate corpora from YouTube comments totalling 47K comments. The results demonstrated that all models, except NB, reached an accuracy of 82%. It was also shown that word tri-grams enhance classification performance, though other tuning techniques were applied such as TF-IDF and grid-search. The linguistic findings, aimed at distinguishing between offensive and abusive language, were consistent with machine learning (ML) performance, which effectively classified the two distinct classes of sentiment: offensive and abusive.

M3 - Conference contribution/Paper

BT - The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS)

A2 - Mitkov, Ruslan

PB - Lancaster University

CY - Lancaster

ER -