Home > Research > Publications & Outputs > Statistical methodology motivated by problems i...

Electronic data

View graph of relations

Statistical methodology motivated by problems in genetics

Research output: ThesisDoctoral Thesis

Published
  • Matthew Sperrin
Close
Publication date2010
Number of pages111
QualificationPhD
Awarding Institution
Supervisors/Advisors
Award date1/07/2010
Place of PublicationLancaster
Publisher
  • Lancaster University
<mark>Original language</mark>English

Abstract

Sequencing the human genome has made vast amounts of potentially useful genetic data accessible. An important challenge in statistics is to develop methodology to extract information from this data. In this thesis, developments are made in two methodological areas that have wide applications in genetics.
First, probabilistic methods to deal with the label switching problem in Bayesian mixture models are introduced. Mixture models are used in situations where populations may consist of a number of sub-populations, or as a semi-parametric modelling tool. The label switching problem can prevent meaningful interpretation of the output of Markov Chain Monte Carlo samplers. Specifically, inference on attributes specific to sub-populations can be difficult. Such attributes play an important role in understanding genetic effects. We introduce probabilistic relabelling strategies as a natural way of overcoming the label switching problem, and compare with existing strategies. The comparisons demonstrate that the
advantages oered by probabilistic strategies come without loss in parameter estimation ability.
Second, we introduce direct eect testing (DET), which is a novel method that distinguishes direct from indirect eects between binary predictors and a binary response. DET consists of two stages: the rst stage nds eects, the second stage infers the uncertainty in determining which predictors cause which eects. The method is useful when it is of interest to recover direct eects between a large number of predictors and the response.
This is a common goal in genetics, where we are interested in the eects of variations in the genome on the prevalence of a phenotype. This work includes detailed simulations, comparing the ability of a number of methods at recovering direct eects. DET outperforms existing methods at recovering direct eects in situations where there is high correlation between predictors, and matches their performance when the correlation is moderate or small.