- Sperrinthesis final post
Accepted author manuscript, 2.79 MB, PDF document

Research output: Thesis › Doctoral Thesis

Published

**Statistical methodology motivated by problems in genetics.** / Sperrin, Matthew.

Research output: Thesis › Doctoral Thesis

Sperrin, M 2010, 'Statistical methodology motivated by problems in genetics', PhD, Lancaster University, Lancaster.

Sperrin, M. (2010). *Statistical methodology motivated by problems in genetics*. Lancaster University.

Sperrin M. Statistical methodology motivated by problems in genetics. Lancaster: Lancaster University, 2010. 111 p.

@phdthesis{3892711b94534b8cb0f80ac07877dcef,

title = "Statistical methodology motivated by problems in genetics",

abstract = "Sequencing the human genome has made vast amounts of potentially useful genetic data accessible. An important challenge in statistics is to develop methodology to extract information from this data. In this thesis, developments are made in two methodological areas that have wide applications in genetics.First, probabilistic methods to deal with the label switching problem in Bayesian mixture models are introduced. Mixture models are used in situations where populations may consist of a number of sub-populations, or as a semi-parametric modelling tool. The label switching problem can prevent meaningful interpretation of the output of Markov Chain Monte Carlo samplers. Specifically, inference on attributes specific to sub-populations can be difficult. Such attributes play an important role in understanding genetic effects. We introduce probabilistic relabelling strategies as a natural way of overcoming the label switching problem, and compare with existing strategies. The comparisons demonstrate that theadvantages oered by probabilistic strategies come without loss in parameter estimation ability.Second, we introduce direct eect testing (DET), which is a novel method that distinguishes direct from indirect eects between binary predictors and a binary response. DET consists of two stages: the rst stage nds eects, the second stage infers the uncertainty in determining which predictors cause which eects. The method is useful when it is of interest to recover direct eects between a large number of predictors and the response.This is a common goal in genetics, where we are interested in the eects of variations in the genome on the prevalence of a phenotype. This work includes detailed simulations, comparing the ability of a number of methods at recovering direct eects. DET outperforms existing methods at recovering direct eects in situations where there is high correlation between predictors, and matches their performance when the correlation is moderate or small.",

author = "Matthew Sperrin",

year = "2010",

language = "English",

publisher = "Lancaster University",

school = "Lancaster University",

}

TY - THES

T1 - Statistical methodology motivated by problems in genetics

AU - Sperrin, Matthew

PY - 2010

Y1 - 2010

N2 - Sequencing the human genome has made vast amounts of potentially useful genetic data accessible. An important challenge in statistics is to develop methodology to extract information from this data. In this thesis, developments are made in two methodological areas that have wide applications in genetics.First, probabilistic methods to deal with the label switching problem in Bayesian mixture models are introduced. Mixture models are used in situations where populations may consist of a number of sub-populations, or as a semi-parametric modelling tool. The label switching problem can prevent meaningful interpretation of the output of Markov Chain Monte Carlo samplers. Specifically, inference on attributes specific to sub-populations can be difficult. Such attributes play an important role in understanding genetic effects. We introduce probabilistic relabelling strategies as a natural way of overcoming the label switching problem, and compare with existing strategies. The comparisons demonstrate that theadvantages oered by probabilistic strategies come without loss in parameter estimation ability.Second, we introduce direct eect testing (DET), which is a novel method that distinguishes direct from indirect eects between binary predictors and a binary response. DET consists of two stages: the rst stage nds eects, the second stage infers the uncertainty in determining which predictors cause which eects. The method is useful when it is of interest to recover direct eects between a large number of predictors and the response.This is a common goal in genetics, where we are interested in the eects of variations in the genome on the prevalence of a phenotype. This work includes detailed simulations, comparing the ability of a number of methods at recovering direct eects. DET outperforms existing methods at recovering direct eects in situations where there is high correlation between predictors, and matches their performance when the correlation is moderate or small.

AB - Sequencing the human genome has made vast amounts of potentially useful genetic data accessible. An important challenge in statistics is to develop methodology to extract information from this data. In this thesis, developments are made in two methodological areas that have wide applications in genetics.First, probabilistic methods to deal with the label switching problem in Bayesian mixture models are introduced. Mixture models are used in situations where populations may consist of a number of sub-populations, or as a semi-parametric modelling tool. The label switching problem can prevent meaningful interpretation of the output of Markov Chain Monte Carlo samplers. Specifically, inference on attributes specific to sub-populations can be difficult. Such attributes play an important role in understanding genetic effects. We introduce probabilistic relabelling strategies as a natural way of overcoming the label switching problem, and compare with existing strategies. The comparisons demonstrate that theadvantages oered by probabilistic strategies come without loss in parameter estimation ability.Second, we introduce direct eect testing (DET), which is a novel method that distinguishes direct from indirect eects between binary predictors and a binary response. DET consists of two stages: the rst stage nds eects, the second stage infers the uncertainty in determining which predictors cause which eects. The method is useful when it is of interest to recover direct eects between a large number of predictors and the response.This is a common goal in genetics, where we are interested in the eects of variations in the genome on the prevalence of a phenotype. This work includes detailed simulations, comparing the ability of a number of methods at recovering direct eects. DET outperforms existing methods at recovering direct eects in situations where there is high correlation between predictors, and matches their performance when the correlation is moderate or small.

M3 - Doctoral Thesis

PB - Lancaster University

CY - Lancaster

ER -