Standard
Offensive Language Identification in Transliterated and Code-Mixed Bangla. / Raihan, Md Nishat; Tanmoy, Umma; Islam, Anika Binte et al.
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023). ed. / Firoj Alam; Sudipta Kar; Shammur Absar Chowdhury; Farig Sadeque; Ruhul Amin. Association for Computational Linguistics, 2023. p. 1-6.
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Harvard
Raihan, MN, Tanmoy, U, Islam, AB, North, K
, Ranasinghe, T, Anastasopoulos, A & Zampieri, M 2023,
Offensive Language Identification in Transliterated and Code-Mixed Bangla. in F Alam, S Kar, SA Chowdhury, F Sadeque & R Amin (eds),
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023). Association for Computational Linguistics, pp. 1-6, The First Workshop on Bangla Language Processing (BLP-2023), Singapore,
7/12/23. <
https://aclanthology.org/2023.banglalp-1.1/>
APA
Raihan, M. N., Tanmoy, U., Islam, A. B., North, K.
, Ranasinghe, T., Anastasopoulos, A., & Zampieri, M. (2023).
Offensive Language Identification in Transliterated and Code-Mixed Bangla. In F. Alam, S. Kar, S. A. Chowdhury, F. Sadeque, & R. Amin (Eds.),
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023) (pp. 1-6). Association for Computational Linguistics.
https://aclanthology.org/2023.banglalp-1.1/
Vancouver
Raihan MN, Tanmoy U, Islam AB, North K
, Ranasinghe T, Anastasopoulos A et al.
Offensive Language Identification in Transliterated and Code-Mixed Bangla. In Alam F, Kar S, Chowdhury SA, Sadeque F, Amin R, editors, Proceedings of the First Workshop on Bangla Language Processing (BLP-2023). Association for Computational Linguistics. 2023. p. 1-6
Author
Bibtex
@inproceedings{2fc1ee40fcc64ce483251cc21fce3748,
title = "Offensive Language Identification in Transliterated and Code-Mixed Bangla",
abstract = "Identifying offensive content in social media is vital to create safe online communities. Several recent studies have addressed this problem by creating datasets for various languages. In this paper, we explore offensive language identification in texts with transliterations and code-mixing, linguistic phenomena common in multilingual societies, and a known challenge for NLP systems. We introduce TB-OLID, a transliterated Bangla offensive language dataset containing 5,000 manually annotated comments. We train and fine-tune machine learning models on TB-OLID, and we evaluate their results on this dataset. Our results show that English pre-trained transformer-based models, such as fBERT and HateBERT achieve the best performance on this dataset.",
author = "Raihan, {Md Nishat} and Umma Tanmoy and Islam, {Anika Binte} and Kai North and Tharindu Ranasinghe and Antonios Anastasopoulos and Marcos Zampieri",
year = "2023",
month = dec,
day = "7",
language = "English",
isbn = "9798891760585",
pages = "1--6",
editor = "Firoj Alam and Sudipta Kar and Chowdhury, {Shammur Absar} and Farig Sadeque and Ruhul Amin",
booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
publisher = "Association for Computational Linguistics",
note = "The First Workshop on Bangla Language Processing (BLP-2023) ; Conference date: 07-12-2023",
url = "https://blp-workshop.github.io/",
}
RIS
TY - GEN
T1 - Offensive Language Identification in Transliterated and Code-Mixed Bangla
AU - Raihan, Md Nishat
AU - Tanmoy, Umma
AU - Islam, Anika Binte
AU - North, Kai
AU - Ranasinghe, Tharindu
AU - Anastasopoulos, Antonios
AU - Zampieri, Marcos
PY - 2023/12/7
Y1 - 2023/12/7
N2 - Identifying offensive content in social media is vital to create safe online communities. Several recent studies have addressed this problem by creating datasets for various languages. In this paper, we explore offensive language identification in texts with transliterations and code-mixing, linguistic phenomena common in multilingual societies, and a known challenge for NLP systems. We introduce TB-OLID, a transliterated Bangla offensive language dataset containing 5,000 manually annotated comments. We train and fine-tune machine learning models on TB-OLID, and we evaluate their results on this dataset. Our results show that English pre-trained transformer-based models, such as fBERT and HateBERT achieve the best performance on this dataset.
AB - Identifying offensive content in social media is vital to create safe online communities. Several recent studies have addressed this problem by creating datasets for various languages. In this paper, we explore offensive language identification in texts with transliterations and code-mixing, linguistic phenomena common in multilingual societies, and a known challenge for NLP systems. We introduce TB-OLID, a transliterated Bangla offensive language dataset containing 5,000 manually annotated comments. We train and fine-tune machine learning models on TB-OLID, and we evaluate their results on this dataset. Our results show that English pre-trained transformer-based models, such as fBERT and HateBERT achieve the best performance on this dataset.
M3 - Conference contribution/Paper
SN - 9798891760585
SP - 1
EP - 6
BT - Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)
A2 - Alam, Firoj
A2 - Kar, Sudipta
A2 - Chowdhury, Shammur Absar
A2 - Sadeque, Farig
A2 - Amin, Ruhul
PB - Association for Computational Linguistics
T2 - The First Workshop on Bangla Language Processing (BLP-2023)
Y2 - 7 December 2023
ER -