Home > Research > Publications & Outputs > Statistical Methods for Samples of Interaction ...

Electronic data

  • 2023boltphd

    Final published version, 3.29 MB, PDF document

Text available via DOI:

View graph of relations

Statistical Methods for Samples of Interaction Networks

Research output: ThesisDoctoral Thesis

Published
Publication date12/2023
Number of pages255
QualificationPhD
Awarding Institution
Supervisors/Advisors
Award date20/12/2023
Publisher
  • Lancaster University
<mark>Original language</mark>English

Abstract

Network data arises through the observation of relational information between a collection of entities. An ubiquitous example of such data are social networks, where friendships amongst a sample of people are observed. However, there has begun to appear other subtly different forms of data which also fit this description. Notably, work in the literature has recently considered when (i) the units of observation within a network are edges or paths, often referred to as interaction networks, with examples such as emails between people or a series of page visits to a website by a user, and (ii) one observes a sample of networks, for example, in neuroscience applications, brain scan data of a single patient is often processed into a network representation, with a multi-patient study thus leading to a sample of networks.

However, the intersection of (i) and (ii) has presently not been considered, that is, where a sample of interaction networks are observed. For example, one might observe a sample of users navigating the same website. Use of currently proposed methods to analyse such data would either be inappropriate or require one to first aggregate data into another form, incurring a potential loss of information. Motivated by this gap in the literature, this thesis proposes statistical methods suitable for the analysis of samples of interaction networks.

In this regard, two main contributions are made. Firstly, the problem of measuring the distance between two interaction networks is considered. Distances are an incredibly useful and versatile tool, opening to door to a variety of analytical methodologies, such as dimension reduction and clustering algorithms. Secondly, building upon this work, the problem of summarising a sample of interaction networks is considered. Of particular focus is obtaining analogues of the mean and variance in this non-trivial scenario, that is, where data points are themselves interaction networks. To this end, a novel Bayesian modelling framework is proposed. Given a user-specified distance measure, we construct Gaussian-like distributions over the space of interaction networks, that is, models parameterised via location and scale. This approach raises significant computational challenges; not only are resulting posterior distributions doubly-intractable, but the parameter space includes the space of interaction networks, which is both discrete and multidimensional. As such, specialised Markov chain Monte Carlo (MCMC) algorithms are developed which circumvent these issues, facilitating parameter inference for the proposed models. Crucially, the location and scale parameters provide analogues of the mean and variance, respectively, resulting in the desired summary measures.

Across both pieces of work, simulation studies are undertaken to confirm the efficacy of proposed methods and to explore their properties. Additionally, their practical applications are illustrated through example analyses of two open-source datasets: (i) an in-play football dataset released by StatsBomb, and (ii) a dataset of user interactions with the location-based social network Foursquare.