Home > Research > Publications & Outputs > Statistical Methods for Modelling Complex Longi...

Electronic data

Text available via DOI:

View graph of relations

Statistical Methods for Modelling Complex Longitudinal Data with Applications in Cancer Pharmacogenetics and Ageing

Research output: ThesisDoctoral Thesis

Publication date2022
Number of pages214
Awarding Institution
  • Lancaster University
<mark>Original language</mark>English


Technological and scientific advancements have promoted data gathering across multiple disciplines emphasizing the necessity for the development of rigorous statistical methods to draw conclusions. Longitudinal data is a key tool to study temporal changes, however, with the increasing data complexity, existing methodologies are often unable to capture non-linear or non-stationary trends. Additionally, irregularly collected, non-continuous or high-dimensional data make statistical analysis even more challenging. Through this work, we develop three statistical models to analyse complex longitudinal data from two real-world databases, the Genomics of Drug Sensitivity in Cancer and the English Longitudinal Study of Ageing.

The first part of this work is motivated by the Genomics of Drug Sensitivity in Cancer project and focuses on the prediction and detection of biomarkers associated with anti-cancer drug dose-response. Here, the longitudinal data available are characterised by complete observed trajectories of drug response over multiple drug dosages which are potentially associated with high-dimensional covariates (these include expression profiles of tens of thousands of genes) in a non-stationary manner. These trends are not easily amenable to analysis by classic parametric or semi-parametric mixed models, especially if high dimensionality is present. We built a dose-varying regression model combined with a two-stage variable selection algorithm (variable screening followed by penalised regression) to identify genetic factors associated with drug response and estimate their effect over the varying dosages.

The second part of this work is motivated by the English Longitudinal Study of Ageing data set. The longitudinal data available in this study are characterised by irregularly collected and, often, incomplete trajectories and many response variables of ordinal type which measure only a small number of ageing domains (data are derived from multiple questionnaires measuring multiple aspects of older peoples' life). The ultimate aim is to understand the ageing dynamics and study the interrelationships between factors associated with it. To do so, we first explore the theoretical foundations of ageing and the data set itself. Next, we adopt and extend the methodological framework of~\cite{dawson2018} to estimate the quantile dynamics and derive predictions for a common surrogate of ageing, frailty, addressing the problem of incomplete individual responses over the age interval of interest. Finally, we develop a bivariate Gaussian process framework for ordinal and potentially irregularly sampled data which allows the available questionnaire responses to be modelled directly. Here, the unobserved ageing domains are assumed to be smooth functions of age. This method allows the assessment of the interrelationships between several ageing domains after adjusting for individual variation across the observed longitudinal trajectories.