The volume of illegal material on file-sharing networks poses a challenge for investigators attempting to police such networks. We propose a novel approach that automates the resource intensive task of identifying previously unknown files of interest amongst hundreds of thousands of files shared on such networks. We also describe how this approach could be used to identify clusters of peers that might be closely related to each other, either as part of a syndicate, or as multiple personae of the same individual.
Our approach is based on the collaborative filtering techniques typically used in recommender systems. In this study we find that we can successfully make use of collaborative filtering techniques to find new media belonging to specific categories of interest to an investigation of a peer-to-peer network, without having to examine filenames or file contents. We also find evidence that distance metrics from collaborative filtering could be useful in the clustering and identification of peers on file-sharing networks. Additionally, we describe an unsuccessful attempt at using collaborative filtering to predict the future file-sharing behaviour of peers.