Research output: Thesis › Doctoral Thesis

Published

Lancaster University, 2023. 255 p.

Research output: Thesis › Doctoral Thesis

Bolt, G 2023, 'Statistical Methods for Samples of Interaction Networks', PhD, Lancaster University. https://doi.org/10.17635/lancaster/thesis/2205

Bolt, G. (2023). *Statistical Methods for Samples of Interaction Networks*. [Doctoral Thesis, Lancaster University]. Lancaster University. https://doi.org/10.17635/lancaster/thesis/2205

Bolt G. Statistical Methods for Samples of Interaction Networks. Lancaster University, 2023. 255 p. doi: 10.17635/lancaster/thesis/2205

@phdthesis{057ed13c193f4c849367870c74a8c8c1,

title = "Statistical Methods for Samples of Interaction Networks",

abstract = "Network data arises through the observation of relational information between a collection of entities. An ubiquitous example of such data are social networks, where friendships amongst a sample of people are observed. However, there has begun to appear other subtly different forms of data which also fit this description. Notably, work in the literature has recently considered when (i) the units of observation within a network are edges or paths, often referred to as interaction networks, with examples such as emails between people or a series of page visits to a website by a user, and (ii) one observes a sample of networks, for example, in neuroscience applications, brain scan data of a single patient is often processed into a network representation, with a multi-patient study thus leading to a sample of networks.However, the intersection of (i) and (ii) has presently not been considered, that is, where a sample of interaction networks are observed. For example, one might observe a sample of users navigating the same website. Use of currently proposed methods to analyse such data would either be inappropriate or require one to first aggregate data into another form, incurring a potential loss of information. Motivated by this gap in the literature, this thesis proposes statistical methods suitable for the analysis of samples of interaction networks.In this regard, two main contributions are made. Firstly, the problem of measuring the distance between two interaction networks is considered. Distances are an incredibly useful and versatile tool, opening to door to a variety of analytical methodologies, such as dimension reduction and clustering algorithms. Secondly, building upon this work, the problem of summarising a sample of interaction networks is considered. Of particular focus is obtaining analogues of the mean and variance in this non-trivial scenario, that is, where data points are themselves interaction networks. To this end, a novel Bayesian modelling framework is proposed. Given a user-specified distance measure, we construct Gaussian-like distributions over the space of interaction networks, that is, models parameterised via location and scale. This approach raises significant computational challenges; not only are resulting posterior distributions doubly-intractable, but the parameter space includes the space of interaction networks, which is both discrete and multidimensional. As such, specialised Markov chain Monte Carlo (MCMC) algorithms are developed which circumvent these issues, facilitating parameter inference for the proposed models. Crucially, the location and scale parameters provide analogues of the mean and variance, respectively, resulting in the desired summary measures.Across both pieces of work, simulation studies are undertaken to confirm the efficacy of proposed methods and to explore their properties. Additionally, their practical applications are illustrated through example analyses of two open-source datasets: (i) an in-play football dataset released by StatsBomb, and (ii) a dataset of user interactions with the location-based social network Foursquare.",

keywords = "Network data, Network science, Bayesian statistics, Markov chain Monte Carlo (MCMC), Distance measures",

author = "George Bolt",

year = "2023",

month = dec,

doi = "10.17635/lancaster/thesis/2205",

language = "English",

publisher = "Lancaster University",

school = "Lancaster University",

}

TY - BOOK

T1 - Statistical Methods for Samples of Interaction Networks

AU - Bolt, George

PY - 2023/12

Y1 - 2023/12

N2 - Network data arises through the observation of relational information between a collection of entities. An ubiquitous example of such data are social networks, where friendships amongst a sample of people are observed. However, there has begun to appear other subtly different forms of data which also fit this description. Notably, work in the literature has recently considered when (i) the units of observation within a network are edges or paths, often referred to as interaction networks, with examples such as emails between people or a series of page visits to a website by a user, and (ii) one observes a sample of networks, for example, in neuroscience applications, brain scan data of a single patient is often processed into a network representation, with a multi-patient study thus leading to a sample of networks.However, the intersection of (i) and (ii) has presently not been considered, that is, where a sample of interaction networks are observed. For example, one might observe a sample of users navigating the same website. Use of currently proposed methods to analyse such data would either be inappropriate or require one to first aggregate data into another form, incurring a potential loss of information. Motivated by this gap in the literature, this thesis proposes statistical methods suitable for the analysis of samples of interaction networks.In this regard, two main contributions are made. Firstly, the problem of measuring the distance between two interaction networks is considered. Distances are an incredibly useful and versatile tool, opening to door to a variety of analytical methodologies, such as dimension reduction and clustering algorithms. Secondly, building upon this work, the problem of summarising a sample of interaction networks is considered. Of particular focus is obtaining analogues of the mean and variance in this non-trivial scenario, that is, where data points are themselves interaction networks. To this end, a novel Bayesian modelling framework is proposed. Given a user-specified distance measure, we construct Gaussian-like distributions over the space of interaction networks, that is, models parameterised via location and scale. This approach raises significant computational challenges; not only are resulting posterior distributions doubly-intractable, but the parameter space includes the space of interaction networks, which is both discrete and multidimensional. As such, specialised Markov chain Monte Carlo (MCMC) algorithms are developed which circumvent these issues, facilitating parameter inference for the proposed models. Crucially, the location and scale parameters provide analogues of the mean and variance, respectively, resulting in the desired summary measures.Across both pieces of work, simulation studies are undertaken to confirm the efficacy of proposed methods and to explore their properties. Additionally, their practical applications are illustrated through example analyses of two open-source datasets: (i) an in-play football dataset released by StatsBomb, and (ii) a dataset of user interactions with the location-based social network Foursquare.

AB - Network data arises through the observation of relational information between a collection of entities. An ubiquitous example of such data are social networks, where friendships amongst a sample of people are observed. However, there has begun to appear other subtly different forms of data which also fit this description. Notably, work in the literature has recently considered when (i) the units of observation within a network are edges or paths, often referred to as interaction networks, with examples such as emails between people or a series of page visits to a website by a user, and (ii) one observes a sample of networks, for example, in neuroscience applications, brain scan data of a single patient is often processed into a network representation, with a multi-patient study thus leading to a sample of networks.However, the intersection of (i) and (ii) has presently not been considered, that is, where a sample of interaction networks are observed. For example, one might observe a sample of users navigating the same website. Use of currently proposed methods to analyse such data would either be inappropriate or require one to first aggregate data into another form, incurring a potential loss of information. Motivated by this gap in the literature, this thesis proposes statistical methods suitable for the analysis of samples of interaction networks.In this regard, two main contributions are made. Firstly, the problem of measuring the distance between two interaction networks is considered. Distances are an incredibly useful and versatile tool, opening to door to a variety of analytical methodologies, such as dimension reduction and clustering algorithms. Secondly, building upon this work, the problem of summarising a sample of interaction networks is considered. Of particular focus is obtaining analogues of the mean and variance in this non-trivial scenario, that is, where data points are themselves interaction networks. To this end, a novel Bayesian modelling framework is proposed. Given a user-specified distance measure, we construct Gaussian-like distributions over the space of interaction networks, that is, models parameterised via location and scale. This approach raises significant computational challenges; not only are resulting posterior distributions doubly-intractable, but the parameter space includes the space of interaction networks, which is both discrete and multidimensional. As such, specialised Markov chain Monte Carlo (MCMC) algorithms are developed which circumvent these issues, facilitating parameter inference for the proposed models. Crucially, the location and scale parameters provide analogues of the mean and variance, respectively, resulting in the desired summary measures.Across both pieces of work, simulation studies are undertaken to confirm the efficacy of proposed methods and to explore their properties. Additionally, their practical applications are illustrated through example analyses of two open-source datasets: (i) an in-play football dataset released by StatsBomb, and (ii) a dataset of user interactions with the location-based social network Foursquare.

KW - Network data

KW - Network science

KW - Bayesian statistics

KW - Markov chain Monte Carlo (MCMC)

KW - Distance measures

U2 - 10.17635/lancaster/thesis/2205

DO - 10.17635/lancaster/thesis/2205

M3 - Doctoral Thesis

PB - Lancaster University

ER -