Home > Research > Publications & Outputs > Multiply imputing missing values in data sets w...

Associated organisational unit

View graph of relations

Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models. / Lee, Min; Mitra, Robin.
In: Computational Statistics and Data Analysis, Vol. 95, 01.03.2016, p. 24-38.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

Lee M, Mitra R. Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models. Computational Statistics and Data Analysis. 2016 Mar 1;95:24-38. Epub 2015 Sept 9. doi: 10.1016/j.csda.2015.08.004

Author

Bibtex

@article{bc73c2e594324c579afcc0ed05ce981e,
title = "Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models",
abstract = "Multiple imputation is a commonly used approach to deal with missing values. In this approach, an imputer repeatedly imputes the missing values by taking draws from the posterior predictive distribution for the missing values conditional on the observed values, and releases these completed data sets to analysts. With each completed data set the analyst performs the analysis of interest, treating the data as if it were fully observed. These analyses are then combined with standard combining rules, allowing the analyst to make appropriate inferences which take into account the uncertainty present due to the missing data. In order to preserve the statistical properties present in the data, the imputer must use a plausible distribution to generate the imputed values. In data sets containing variables with different measurement scales, e.g. some categorical and some continuous variables, this is a challenging problem. A method is proposed to multiply impute missing values in such data sets by modelling the joint distribution of the variables in the data through a sequence of generalised linear models, and data augmentation methods are used to draw imputations from a proper posterior distribution using Markov Chain Monte Carlo (MCMC). The performance of the proposed method is illustrated using simulation studies and on a data set taken from a breast feeding study.",
keywords = "data augmentation, latent variable, missing data, multiple imputation",
author = "Min Lee and Robin Mitra",
year = "2016",
month = mar,
day = "1",
doi = "10.1016/j.csda.2015.08.004",
language = "English",
volume = "95",
pages = "24--38",
journal = "Computational Statistics and Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",

}

RIS

TY - JOUR

T1 - Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models

AU - Lee, Min

AU - Mitra, Robin

PY - 2016/3/1

Y1 - 2016/3/1

N2 - Multiple imputation is a commonly used approach to deal with missing values. In this approach, an imputer repeatedly imputes the missing values by taking draws from the posterior predictive distribution for the missing values conditional on the observed values, and releases these completed data sets to analysts. With each completed data set the analyst performs the analysis of interest, treating the data as if it were fully observed. These analyses are then combined with standard combining rules, allowing the analyst to make appropriate inferences which take into account the uncertainty present due to the missing data. In order to preserve the statistical properties present in the data, the imputer must use a plausible distribution to generate the imputed values. In data sets containing variables with different measurement scales, e.g. some categorical and some continuous variables, this is a challenging problem. A method is proposed to multiply impute missing values in such data sets by modelling the joint distribution of the variables in the data through a sequence of generalised linear models, and data augmentation methods are used to draw imputations from a proper posterior distribution using Markov Chain Monte Carlo (MCMC). The performance of the proposed method is illustrated using simulation studies and on a data set taken from a breast feeding study.

AB - Multiple imputation is a commonly used approach to deal with missing values. In this approach, an imputer repeatedly imputes the missing values by taking draws from the posterior predictive distribution for the missing values conditional on the observed values, and releases these completed data sets to analysts. With each completed data set the analyst performs the analysis of interest, treating the data as if it were fully observed. These analyses are then combined with standard combining rules, allowing the analyst to make appropriate inferences which take into account the uncertainty present due to the missing data. In order to preserve the statistical properties present in the data, the imputer must use a plausible distribution to generate the imputed values. In data sets containing variables with different measurement scales, e.g. some categorical and some continuous variables, this is a challenging problem. A method is proposed to multiply impute missing values in such data sets by modelling the joint distribution of the variables in the data through a sequence of generalised linear models, and data augmentation methods are used to draw imputations from a proper posterior distribution using Markov Chain Monte Carlo (MCMC). The performance of the proposed method is illustrated using simulation studies and on a data set taken from a breast feeding study.

KW - data augmentation

KW - latent variable

KW - missing data

KW - multiple imputation

U2 - 10.1016/j.csda.2015.08.004

DO - 10.1016/j.csda.2015.08.004

M3 - Journal article

VL - 95

SP - 24

EP - 38

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

SN - 0167-9473

ER -