Offensive Language Identification in Transliterated and Code-Mixed Bangla

Computing and Communications

Electronic data

2023.banglalp-1.1
Final published version, 104 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Offensive Language Identification in Transliterated and Code-Mixed Bangla. / Raihan, Md Nishat; Tanmoy, Umma; Islam, Anika Binte et al.
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023). ed. / Firoj Alam; Sudipta Kar; Shammur Absar Chowdhury; Farig Sadeque; Ruhul Amin. Association for Computational Linguistics, 2023. p. 1-6.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Raihan, MN, Tanmoy, U, Islam, AB, North, K, Ranasinghe, T, Anastasopoulos, A & Zampieri, M 2023, Offensive Language Identification in Transliterated and Code-Mixed Bangla. in F Alam, S Kar, SA Chowdhury, F Sadeque & R Amin (eds), Proceedings of the First Workshop on Bangla Language Processing (BLP-2023). Association for Computational Linguistics, pp. 1-6, The First Workshop on Bangla Language Processing (BLP-2023), Singapore, 7/12/23. <https://aclanthology.org/2023.banglalp-1.1/>

APA

Raihan, M. N., Tanmoy, U., Islam, A. B., North, K., Ranasinghe, T., Anastasopoulos, A., & Zampieri, M. (2023). Offensive Language Identification in Transliterated and Code-Mixed Bangla. In F. Alam, S. Kar, S. A. Chowdhury, F. Sadeque, & R. Amin (Eds.), Proceedings of the First Workshop on Bangla Language Processing (BLP-2023) (pp. 1-6). Association for Computational Linguistics. https://aclanthology.org/2023.banglalp-1.1/

Vancouver

Raihan MN, Tanmoy U, Islam AB, North K, Ranasinghe T, Anastasopoulos A et al. Offensive Language Identification in Transliterated and Code-Mixed Bangla. In Alam F, Kar S, Chowdhury SA, Sadeque F, Amin R, editors, Proceedings of the First Workshop on Bangla Language Processing (BLP-2023). Association for Computational Linguistics. 2023. p. 1-6

Author

Raihan, Md Nishat ; Tanmoy, Umma ; Islam, Anika Binte et al. / Offensive Language Identification in Transliterated and Code-Mixed Bangla. Proceedings of the First Workshop on Bangla Language Processing (BLP-2023). editor / Firoj Alam ; Sudipta Kar ; Shammur Absar Chowdhury ; Farig Sadeque ; Ruhul Amin. Association for Computational Linguistics, 2023. pp. 1-6

Bibtex

@inproceedings{2fc1ee40fcc64ce483251cc21fce3748,

title = "Offensive Language Identification in Transliterated and Code-Mixed Bangla",

abstract = "Identifying offensive content in social media is vital to create safe online communities. Several recent studies have addressed this problem by creating datasets for various languages. In this paper, we explore offensive language identification in texts with transliterations and code-mixing, linguistic phenomena common in multilingual societies, and a known challenge for NLP systems. We introduce TB-OLID, a transliterated Bangla offensive language dataset containing 5,000 manually annotated comments. We train and fine-tune machine learning models on TB-OLID, and we evaluate their results on this dataset. Our results show that English pre-trained transformer-based models, such as fBERT and HateBERT achieve the best performance on this dataset.",

author = "Raihan, {Md Nishat} and Umma Tanmoy and Islam, {Anika Binte} and Kai North and Tharindu Ranasinghe and Antonios Anastasopoulos and Marcos Zampieri",

year = "2023",

month = dec,

day = "7",

language = "English",

isbn = "9798891760585",

pages = "1--6",

editor = "Firoj Alam and Sudipta Kar and Chowdhury, {Shammur Absar} and Farig Sadeque and Ruhul Amin",

booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",

publisher = "Association for Computational Linguistics",

note = "The First Workshop on Bangla Language Processing (BLP-2023) ; Conference date: 07-12-2023",

url = "https://blp-workshop.github.io/",

}

RIS

TY - GEN

T1 - Offensive Language Identification in Transliterated and Code-Mixed Bangla

AU - Raihan, Md Nishat

AU - Tanmoy, Umma

AU - Islam, Anika Binte

AU - North, Kai

AU - Ranasinghe, Tharindu

AU - Anastasopoulos, Antonios

AU - Zampieri, Marcos

PY - 2023/12/7

Y1 - 2023/12/7

N2 - Identifying offensive content in social media is vital to create safe online communities. Several recent studies have addressed this problem by creating datasets for various languages. In this paper, we explore offensive language identification in texts with transliterations and code-mixing, linguistic phenomena common in multilingual societies, and a known challenge for NLP systems. We introduce TB-OLID, a transliterated Bangla offensive language dataset containing 5,000 manually annotated comments. We train and fine-tune machine learning models on TB-OLID, and we evaluate their results on this dataset. Our results show that English pre-trained transformer-based models, such as fBERT and HateBERT achieve the best performance on this dataset.

AB - Identifying offensive content in social media is vital to create safe online communities. Several recent studies have addressed this problem by creating datasets for various languages. In this paper, we explore offensive language identification in texts with transliterations and code-mixing, linguistic phenomena common in multilingual societies, and a known challenge for NLP systems. We introduce TB-OLID, a transliterated Bangla offensive language dataset containing 5,000 manually annotated comments. We train and fine-tune machine learning models on TB-OLID, and we evaluate their results on this dataset. Our results show that English pre-trained transformer-based models, such as fBERT and HateBERT achieve the best performance on this dataset.

M3 - Conference contribution/Paper

SN - 9798891760585

SP - 1

EP - 6

BT - Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)

A2 - Alam, Firoj

A2 - Kar, Sudipta

A2 - Chowdhury, Shammur Absar

A2 - Sadeque, Farig

A2 - Amin, Ruhul

PB - Association for Computational Linguistics

T2 - The First Workshop on Bangla Language Processing (BLP-2023)

Y2 - 7 December 2023

ER -

Research

Electronic data

Links