Understanding the effects of dichotomization of continuous outcomes on geostatistical inference

Data Science Institute

Associated organisational unit

DSI - Health

Electronic data

Understanding_the_effects_of_dichotomization_of_continuous_outcomes_on_geostatistical_inference(1)
Rights statement: This is the author’s version of a work that was accepted for publication in Spatial Statistics. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Spatial Statistics, 42, 2021 DOI: 10.1016/j.spasta.2020.100424
Accepted author manuscript, 583 KB, PDF document
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

https://doi.org/10.1016/j.spasta.2020.100424
Final published version

Keywords

Binary data, Dichotomization, Disease mapping, Linear geostatistical model, Model-based geostatistics, Prevalence

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Understanding the effects of dichotomization of continuous outcomes on geostatistical inference. / Kyomuhangi, Irene; Abeku, Tarekegn A; Kirby, Matthew J et al.
In: Spatial Statistics, Vol. 42, 100424, 30.04.2021.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Kyomuhangi, I, Abeku, TA, Kirby, MJ, Tesfaye, G & Giorgi, E 2021, 'Understanding the effects of dichotomization of continuous outcomes on geostatistical inference', Spatial Statistics, vol. 42, 100424. https://doi.org/10.1016/j.spasta.2020.100424

APA

Kyomuhangi, I., Abeku, T. A., Kirby, M. J., Tesfaye, G., & Giorgi, E. (2021). Understanding the effects of dichotomization of continuous outcomes on geostatistical inference. Spatial Statistics, 42, Article 100424. https://doi.org/10.1016/j.spasta.2020.100424

Vancouver

Kyomuhangi I, Abeku TA, Kirby MJ, Tesfaye G, Giorgi E. Understanding the effects of dichotomization of continuous outcomes on geostatistical inference. Spatial Statistics. 2021 Apr 30;42:100424. Epub 2020 Feb 28. doi: 10.1016/j.spasta.2020.100424

Author

Kyomuhangi, Irene ; Abeku, Tarekegn A ; Kirby, Matthew J et al. / Understanding the effects of dichotomization of continuous outcomes on geostatistical inference. In: Spatial Statistics. 2021 ; Vol. 42.

Bibtex

@article{9fd34d242b0047dc8285f46d8e95becc,

title = "Understanding the effects of dichotomization of continuous outcomes on geostatistical inference",

abstract = "Diagnosis is often based on the exceedance or not of continuous health indicators of a predefined cut-off value, so as to classify patients into positives and negatives for the disease under investigation. In this paper, we investigate the effects of dichotomization of spatially-referenced continuous outcome variables on geostatistical inference. Although this issue has been extensively studied in other fields, dichotomization is still a common practice in epidemiological studies. Furthermore, the effects of this practice in the context of prevalence mapping have not been fully understood. Here, we demonstrate how spatial correlation affects the loss of information due to dichotomization, how linear geostatistical models can be used to map disease prevalence and thus avoid dichotomization, and finally, how dichotomization affects our predictive inference on prevalence. To pursue these objectives, we develop a metric, based on the composite likelihood, which can be used to quantify the potential loss of information after dichotomization without requiring the fitting of Binomial geostatistical models. Through a simulation study and two applications on disease mapping in Africa, we show that, as thresholds used for dichotomization move further away from the mean of the underlying process, the performance of binomial geostatistical models deteriorates substantially. We also find that dichotomization can lead to the loss of fine scale features of disease prevalence and increased uncertainty in the parameter estimates, especially in the presence of a large noise to signal ratio. These findings strongly support the conclusions from previous studies that dichotomization should be always avoided whenever feasible.",

keywords = "Binary data, Dichotomization, Disease mapping, Linear geostatistical model, Model-based geostatistics, Prevalence",

author = "Irene Kyomuhangi and Abeku, {Tarekegn A} and Kirby, {Matthew J} and Gezahegn Tesfaye and Emanuele Giorgi",

note = "This is the author{\textquoteright}s version of a work that was accepted for publication in Spatial Statistics. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Spatial Statistics, 42, 2021 DOI: 10.1016/j.spasta.2020.100424",

year = "2021",

month = apr,

day = "30",

doi = "10.1016/j.spasta.2020.100424",

language = "English",

volume = "42",

journal = "Spatial Statistics",

issn = "2211-6753",

publisher = "Elsevier BV",

}

RIS

TY - JOUR

T1 - Understanding the effects of dichotomization of continuous outcomes on geostatistical inference

AU - Kyomuhangi, Irene

AU - Abeku, Tarekegn A

AU - Kirby, Matthew J

AU - Tesfaye, Gezahegn

AU - Giorgi, Emanuele

N1 - This is the author’s version of a work that was accepted for publication in Spatial Statistics. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Spatial Statistics, 42, 2021 DOI: 10.1016/j.spasta.2020.100424

PY - 2021/4/30

Y1 - 2021/4/30

N2 - Diagnosis is often based on the exceedance or not of continuous health indicators of a predefined cut-off value, so as to classify patients into positives and negatives for the disease under investigation. In this paper, we investigate the effects of dichotomization of spatially-referenced continuous outcome variables on geostatistical inference. Although this issue has been extensively studied in other fields, dichotomization is still a common practice in epidemiological studies. Furthermore, the effects of this practice in the context of prevalence mapping have not been fully understood. Here, we demonstrate how spatial correlation affects the loss of information due to dichotomization, how linear geostatistical models can be used to map disease prevalence and thus avoid dichotomization, and finally, how dichotomization affects our predictive inference on prevalence. To pursue these objectives, we develop a metric, based on the composite likelihood, which can be used to quantify the potential loss of information after dichotomization without requiring the fitting of Binomial geostatistical models. Through a simulation study and two applications on disease mapping in Africa, we show that, as thresholds used for dichotomization move further away from the mean of the underlying process, the performance of binomial geostatistical models deteriorates substantially. We also find that dichotomization can lead to the loss of fine scale features of disease prevalence and increased uncertainty in the parameter estimates, especially in the presence of a large noise to signal ratio. These findings strongly support the conclusions from previous studies that dichotomization should be always avoided whenever feasible.

AB - Diagnosis is often based on the exceedance or not of continuous health indicators of a predefined cut-off value, so as to classify patients into positives and negatives for the disease under investigation. In this paper, we investigate the effects of dichotomization of spatially-referenced continuous outcome variables on geostatistical inference. Although this issue has been extensively studied in other fields, dichotomization is still a common practice in epidemiological studies. Furthermore, the effects of this practice in the context of prevalence mapping have not been fully understood. Here, we demonstrate how spatial correlation affects the loss of information due to dichotomization, how linear geostatistical models can be used to map disease prevalence and thus avoid dichotomization, and finally, how dichotomization affects our predictive inference on prevalence. To pursue these objectives, we develop a metric, based on the composite likelihood, which can be used to quantify the potential loss of information after dichotomization without requiring the fitting of Binomial geostatistical models. Through a simulation study and two applications on disease mapping in Africa, we show that, as thresholds used for dichotomization move further away from the mean of the underlying process, the performance of binomial geostatistical models deteriorates substantially. We also find that dichotomization can lead to the loss of fine scale features of disease prevalence and increased uncertainty in the parameter estimates, especially in the presence of a large noise to signal ratio. These findings strongly support the conclusions from previous studies that dichotomization should be always avoided whenever feasible.

KW - Binary data

KW - Dichotomization

KW - Disease mapping

KW - Linear geostatistical model

KW - Model-based geostatistics

KW - Prevalence

U2 - 10.1016/j.spasta.2020.100424

DO - 10.1016/j.spasta.2020.100424

M3 - Journal article

VL - 42

JO - Spatial Statistics

JF - Spatial Statistics

SN - 2211-6753

M1 - 100424

ER -

Research

Associated organisational unit

Electronic data

Links

Text available via DOI:

Keywords