Continuous measurements of health outcome data are often dichotomized into binary ( i.e. positive/negative) data for diagnosis and subsequent statistical analysis. The disadvantages of dichotomizing continuous data for statistical inference are well established in the literature, yet this practice is commonplace in health research.
In this thesis, we investigate the impact of dichotomization of data when the aim of analysis is to determine disease prevalence and risk, and propose solutions to some of the main challenges introduced by dichotomization in the context of global heath research.
First, using model-based geostatistics, we show how dichotomization reduces the predictive performance of geostatistical models through loss of information and by reducing the reliability of parameter estimates. We demonstrate this using a simulation study, as well as mapping prevalence and risk of anaemia in Ethiopia, and stunting in Ghana.
We then explore the limitations dichotomization introduces to estimation of malaria transmission in serology models, and propose a novel flexible and unified modelling framework which uses continuous antibody measurements instead of dichotomized data to estimate transmission intensity. Using Western Kenya, we demonstrate the properties of this new approach.
Finally, we address the use of thresholds for dichotomization of continuous antibody measurements when the goal is to estimate malaria seroprevalence. We utilize the principles of the unified modelling framework to develop a threshold-free approach to estimating seroprevalence. Using the same Western Kenyan data-set, we show how this new approach improves model fit and provides more consistent estimates than traditional methods.
Together, these investigations demonstrate the significant impact dichotomization of continuous data has on statistical inference across different areas of health research, and that this practice should be avoided where possible.