Rights statement: © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated
Final published version, 466 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Tilting the lasso by knowledge-based post-processing
AU - Tharmaratnam, Kukatharmini
AU - Sperrin, Matthew
AU - Jaki, Thomas Friedrich
AU - Reppe, Sjur
AU - Frigessi, Arnoldo
PY - 2016/9/2
Y1 - 2016/9/2
N2 - BackgroundIt is useful to incorporate biological knowledge on the role of genetic determinants in predicting an outcome. It is, however, not always feasible to fully elicit this information when the number of determinants is large. We present an approach to overcome this difficulty. First, using half of the available data, a shortlist of potentially interesting determinants are generated. Second, binary indications of biological importance are elicited for this much smaller number of determinants. Third, an analysis is carried out on this shortlist using the second half of the data.ResultsWe show through simulations that, compared with adaptive lasso, this approach leads to models containing more biologically relevant variables, while the prediction mean squared error (PMSE) is comparable or even reduced. We also apply our approach to bone mineral density data, and again final models contain more biologically relevant variables and have reduced PMSEs.ConclusionOur method leads to comparable or improved predictive performance, and models with greater face validity and interpretability with feasible incorporation of biological knowledge into predictive models.
AB - BackgroundIt is useful to incorporate biological knowledge on the role of genetic determinants in predicting an outcome. It is, however, not always feasible to fully elicit this information when the number of determinants is large. We present an approach to overcome this difficulty. First, using half of the available data, a shortlist of potentially interesting determinants are generated. Second, binary indications of biological importance are elicited for this much smaller number of determinants. Third, an analysis is carried out on this shortlist using the second half of the data.ResultsWe show through simulations that, compared with adaptive lasso, this approach leads to models containing more biologically relevant variables, while the prediction mean squared error (PMSE) is comparable or even reduced. We also apply our approach to bone mineral density data, and again final models contain more biologically relevant variables and have reduced PMSEs.ConclusionOur method leads to comparable or improved predictive performance, and models with greater face validity and interpretability with feasible incorporation of biological knowledge into predictive models.
KW - Bone mineral density
KW - Elicitation
KW - Lasso
U2 - 10.1186/s12859-016-1210-7
DO - 10.1186/s12859-016-1210-7
M3 - Journal article
VL - 17
JO - BMC Bioinformatics
JF - BMC Bioinformatics
SN - 1471-2105
M1 - 344
ER -