Detecting deceptive behaviour in the wild - Research Portal

Computing and Communications

Associated organisational unit

Data Science

Electronic data

2018peersmanphd
Final published version, 2.15 MB, PDF document
Available under license: CC BY-ND: Creative Commons Attribution-NoDerivatives 4.0 International License

Text available via DOI:

https://doi.org/10.17635/lancaster/thesis/553
Final published version

View graph of relations

Detecting deceptive behaviour in the wild: text mining for online child protection in the presence of noisy and adversarial social media communications

Research output: Thesis › Doctoral Thesis

Published

Claudia Peersman

More...

Publication date	2018
Number of pages	184
Qualification	PhD
Awarding Institution	Lancaster University
Supervisors/Advisors	Rashid, Awais, Supervisor Baron, Alistair, Supervisor
Publisher	Lancaster University
<mark>Original language</mark>	English

Abstract

A real-life application of text mining research “in the wild”, i.e. in online social media, differs
from more general applications in that its defining characteristics are both domain and process
dependent. This gives rise to a number of challenges of which contemporary research has only
scratched the surface. More specifically, a text mining approach applied in the wild typically
has no control over the dataset size. Hence, the system has to be robust towards limited data
availability, a variable number of samples across users and a highly skewed dataset. Additionally,
the quality of the data cannot be guaranteed. As a result, the approach needs to be tolerant to
a certain degree of linguistic noise. Finally, it has to be robust towards deceptive behaviour or
adversaries.
This thesis examines the viability of a text mining approach for supporting cybercrime
investigations pertaining to online child protection. The main contributions of this dissertation
are as follows. A systematic study of different aspects of methodological design of a state-ofthe-
art text mining approach is presented to assess its scalability towards a large, imbalanced
and linguistically noisy social media dataset. In this framework, three key automatic text
categorisation tasks are examined, namely the feasibility to (i) identify a social network user’s age
group and gender based on textual information found in only one single message; (ii) aggregate
predictions on the message level to the user level without neglecting potential clues of deception
and detect false user profiles on social networks and (iii) identify child sexual abuse media among
thousands of legal other media, including adult pornography, based on their filename. Finally, a
novel approach is presented that combines age group predictions with advanced text clustering
techniques and unsupervised learning to identify online child sex offenders’ grooming behaviour.
The methodology presented in this thesis was extensively discussed with law enforcement
to assess its forensic readiness. Additionally, each component was evaluated on actual child sex
offender data. Despite the challenging characteristics of these text types, the results show high
degrees of accuracy for false profile detection, identifying grooming behaviour and child sexual
abuse media identification.

Research

Associated organisational unit

Electronic data

Text available via DOI:

Detecting deceptive behaviour in the wild: text mining for online child protection in the presence of noisy and adversarial social media communications

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us