Predicting the type and target of offensive social media posts in Marathi

Computing and Communications

Text available via DOI:

https://doi.org/10.1007/s13278-022-00906-8
Final published version

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Predicting the type and target of offensive social media posts in Marathi. / Zampieri, Marcos; Ranasinghe, Tharindu; Chaudhari, Mrinal et al.
In: Social Network Analysis and Mining , Vol. 12, 77, 09.07.2022.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Zampieri, M, Ranasinghe, T, Chaudhari, M, Sampatrao Gaikwad, S, Krishna, P, Nene, M & Paygude , S 2022, 'Predicting the type and target of offensive social media posts in Marathi', Social Network Analysis and Mining , vol. 12, 77. https://doi.org/10.1007/s13278-022-00906-8

APA

Zampieri, M., Ranasinghe, T., Chaudhari, M., Sampatrao Gaikwad, S., Krishna, P., Nene, M., & Paygude , S. (2022). Predicting the type and target of offensive social media posts in Marathi. Social Network Analysis and Mining , 12, Article 77. https://doi.org/10.1007/s13278-022-00906-8

Vancouver

Zampieri M, Ranasinghe T, Chaudhari M, Sampatrao Gaikwad S, Krishna P, Nene M et al. Predicting the type and target of offensive social media posts in Marathi. Social Network Analysis and Mining . 2022 Jul 9;12:77. doi: 10.1007/s13278-022-00906-8

Author

Zampieri, Marcos ; Ranasinghe, Tharindu ; Chaudhari, Mrinal et al. / Predicting the type and target of offensive social media posts in Marathi. In: Social Network Analysis and Mining . 2022 ; Vol. 12.

Bibtex

@article{105ed6cd6a53486a91db5a2469d77abe,

title = "Predicting the type and target of offensive social media posts in Marathi",

abstract = "The presence of offensive language on social media is very common motivating platforms to invest in strategies to make communities safer. This includes developing robust machine learning systems capable of recognizing offensive content online. Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English and a few other high-resource languages such as French, German, and Spanish. In this paper, we address this gap by tackling offensive language identification in Marathi, a low-resource Indo-Aryan language spoken in India. We introduce the Marathi Offensive Language Dataset v.2.0 or MOLD 2.0 and present multiple experiments on this dataset. MOLD 2.0 is a much larger version of MOLD with expanded annotation to the levels B (type) and C (target) of the popular OLID taxonomy. MOLD 2.0 is the first hierarchical offensive language dataset compiled for Marathi, thus opening new avenues for research in low-resource Indo-Aryan languages. Finally, we also introduce SeMOLD, a larger dataset annotated following the semi-supervised methods presented in SOLID (Rosenthal et al. in SOLID: a large-scale semi-supervised dataset for offensive language identification. In: Findings of ACL, 2021).",

author = "Marcos Zampieri and Tharindu Ranasinghe and Mrinal Chaudhari and {Sampatrao Gaikwad}, Saurabh and Prajwal Krishna and Mayuresh Nene and Shrunali Paygude",

year = "2022",

month = jul,

day = "9",

doi = "10.1007/s13278-022-00906-8",

language = "English",

volume = "12",

journal = " Social Network Analysis and Mining ",

issn = "1869-5469",

publisher = "Springer",

}

RIS

TY - JOUR

T1 - Predicting the type and target of offensive social media posts in Marathi

AU - Zampieri, Marcos

AU - Ranasinghe, Tharindu

AU - Chaudhari, Mrinal

AU - Sampatrao Gaikwad, Saurabh

AU - Krishna, Prajwal

AU - Nene, Mayuresh

AU - Paygude , Shrunali

PY - 2022/7/9

Y1 - 2022/7/9

N2 - The presence of offensive language on social media is very common motivating platforms to invest in strategies to make communities safer. This includes developing robust machine learning systems capable of recognizing offensive content online. Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English and a few other high-resource languages such as French, German, and Spanish. In this paper, we address this gap by tackling offensive language identification in Marathi, a low-resource Indo-Aryan language spoken in India. We introduce the Marathi Offensive Language Dataset v.2.0 or MOLD 2.0 and present multiple experiments on this dataset. MOLD 2.0 is a much larger version of MOLD with expanded annotation to the levels B (type) and C (target) of the popular OLID taxonomy. MOLD 2.0 is the first hierarchical offensive language dataset compiled for Marathi, thus opening new avenues for research in low-resource Indo-Aryan languages. Finally, we also introduce SeMOLD, a larger dataset annotated following the semi-supervised methods presented in SOLID (Rosenthal et al. in SOLID: a large-scale semi-supervised dataset for offensive language identification. In: Findings of ACL, 2021).

AB - The presence of offensive language on social media is very common motivating platforms to invest in strategies to make communities safer. This includes developing robust machine learning systems capable of recognizing offensive content online. Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English and a few other high-resource languages such as French, German, and Spanish. In this paper, we address this gap by tackling offensive language identification in Marathi, a low-resource Indo-Aryan language spoken in India. We introduce the Marathi Offensive Language Dataset v.2.0 or MOLD 2.0 and present multiple experiments on this dataset. MOLD 2.0 is a much larger version of MOLD with expanded annotation to the levels B (type) and C (target) of the popular OLID taxonomy. MOLD 2.0 is the first hierarchical offensive language dataset compiled for Marathi, thus opening new avenues for research in low-resource Indo-Aryan languages. Finally, we also introduce SeMOLD, a larger dataset annotated following the semi-supervised methods presented in SOLID (Rosenthal et al. in SOLID: a large-scale semi-supervised dataset for offensive language identification. In: Findings of ACL, 2021).

U2 - 10.1007/s13278-022-00906-8

DO - 10.1007/s13278-022-00906-8

M3 - Journal article

VL - 12

JO - Social Network Analysis and Mining

JF - Social Network Analysis and Mining

SN - 1869-5469

M1 - 77

ER -

Research

Links

Text available via DOI: