Statistical Methods for Modelling Complex Longitudinal Data with Applications in Cancer Pharmacogenetics and Ageing

Associated organisational unit

Extreme Value Theory

Electronic data

2022koukouliphd
Final published version, 9.39 MB, PDF document

Text available via DOI:

https://doi.org/10.17635/lancaster/thesis/1792
Final published version

Keywords

Longitudinal data analysis, Repeated measurements, Non-parametric estimation, Gaussian processes, quantile regression, English Longitudinal Study of Ageing, Genomics of Drug Sensitivity in Cancer, Varying Coefficients Model, High-dimensional problems, multivariate longitudinal data

View graph of relations

Research output: Thesis › Doctoral Thesis

Published

Evanthia Koukouli

More...

Publication date	2022
Number of pages	214
Qualification	PhD
Awarding Institution	Lancaster University
Supervisors/Advisors	Park, Juhyun, Supervisor Titman, Andrew, Supervisor Doebler, Stefanie, Supervisor
Publisher	Lancaster University
<mark>Original language</mark>	English

Abstract

Technological and scientific advancements have promoted data gathering across multiple disciplines emphasizing the necessity for the development of rigorous statistical methods to draw conclusions. Longitudinal data is a key tool to study temporal changes, however, with the increasing data complexity, existing methodologies are often unable to capture non-linear or non-stationary trends. Additionally, irregularly collected, non-continuous or high-dimensional data make statistical analysis even more challenging. Through this work, we develop three statistical models to analyse complex longitudinal data from two real-world databases, the Genomics of Drug Sensitivity in Cancer and the English Longitudinal Study of Ageing.

The first part of this work is motivated by the Genomics of Drug Sensitivity in Cancer project and focuses on the prediction and detection of biomarkers associated with anti-cancer drug dose-response. Here, the longitudinal data available are characterised by complete observed trajectories of drug response over multiple drug dosages which are potentially associated with high-dimensional covariates (these include expression profiles of tens of thousands of genes) in a non-stationary manner. These trends are not easily amenable to analysis by classic parametric or semi-parametric mixed models, especially if high dimensionality is present. We built a dose-varying regression model combined with a two-stage variable selection algorithm (variable screening followed by penalised regression) to identify genetic factors associated with drug response and estimate their effect over the varying dosages.

The second part of this work is motivated by the English Longitudinal Study of Ageing data set. The longitudinal data available in this study are characterised by irregularly collected and, often, incomplete trajectories and many response variables of ordinal type which measure only a small number of ageing domains (data are derived from multiple questionnaires measuring multiple aspects of older peoples' life). The ultimate aim is to understand the ageing dynamics and study the interrelationships between factors associated with it. To do so, we first explore the theoretical foundations of ageing and the data set itself. Next, we adopt and extend the methodological framework of~\cite{dawson2018} to estimate the quantile dynamics and derive predictions for a common surrogate of ageing, frailty, addressing the problem of incomplete individual responses over the age interval of interest. Finally, we develop a bivariate Gaussian process framework for ordinal and potentially irregularly sampled data which allows the available questionnaire responses to be modelled directly. Here, the unobserved ageing domains are assumed to be smooth functions of age. This method allows the assessment of the interrelationships between several ageing domains after adjusting for individual variation across the observed longitudinal trajectories.

Research

Associated organisational unit

Electronic data

Text available via DOI:

Keywords