Home > Research > Publications & Outputs > Identifying variables responsible for clusterin...
View graph of relations

Identifying variables responsible for clustering in discriminant analysis of data from infrared microspectroscopy of a biological sample.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Identifying variables responsible for clustering in discriminant analysis of data from infrared microspectroscopy of a biological sample. / Martin, Francis L.; German, Matthew; Wit, Ernst et al.
In: Journal of Computational Biology, Vol. 14, No. 9, 11.2007, p. 1176-1184.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

Martin FL, German M, Wit E, Fearn T, Ragavan N, Pollock HM. Identifying variables responsible for clustering in discriminant analysis of data from infrared microspectroscopy of a biological sample. Journal of Computational Biology. 2007 Nov;14(9):1176-1184. doi: 10.1089/cmb.2007.0057

Author

Martin, Francis L. ; German, Matthew ; Wit, Ernst et al. / Identifying variables responsible for clustering in discriminant analysis of data from infrared microspectroscopy of a biological sample. In: Journal of Computational Biology. 2007 ; Vol. 14, No. 9. pp. 1176-1184.

Bibtex

@article{b70c91387b4b4a2b9461a7a2033b3e81,
title = "Identifying variables responsible for clustering in discriminant analysis of data from infrared microspectroscopy of a biological sample.",
abstract = "In the biomedical field, infrared (IR) spectroscopic studies can involve the processing of data derived from many samples, divided into classes such as category of tissue (e.g., normal or cancerous) or patient identity. We require reliable methods to identify the class-specific information on which of the wavenumbers, representing various molecular groups, are responsible for observed class groupings. Employing a prostate tissue sample divided into three regions (transition zone, peripheral zone, and adjacent adenocarcinoma), and interrogated using synchrotron Fourier-transform IR microspectroscopy, we compared two statistical methods: (a) a new “cluster vector” version of principal component analysis (PCA) in which the dimensions of the dataset are reduced, followed by linear discriminant analysis (LDA) to reveal clusters, through each of which a vector is constructed that identifies the contributory wavenumbers; and (b) stepwise LDA, which exploits the fact that spectral peaks which identify certain chemical bonds extend over several wavenumbers, and which following classification via either one or two wavenumbers, checks whether the resulting predictions are stable across a range of nearby wavenumbers. Stepwise LDA is the simpler of the two methods; the cluster vector approach can indicate which of the different classes of spectra exhibit the significant differences in signal seen at the “prominent” wavenumbers identified. In situations where IR spectra are found to separate into classes, the excellent agreement between the two quite different methods points to what will prove to be a new and reliable approach to establishing which molecular groups are responsible for such separation.",
keywords = "adenocarcinoma, biomedical, clustering, LDA, microspectroscopy, misclassification",
author = "Martin, {Francis L.} and Matthew German and Ernst Wit and Thomas Fearn and Narasimhan Ragavan and Pollock, {Hubert M.}",
year = "2007",
month = nov,
doi = "10.1089/cmb.2007.0057",
language = "English",
volume = "14",
pages = "1176--1184",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "9",

}

RIS

TY - JOUR

T1 - Identifying variables responsible for clustering in discriminant analysis of data from infrared microspectroscopy of a biological sample.

AU - Martin, Francis L.

AU - German, Matthew

AU - Wit, Ernst

AU - Fearn, Thomas

AU - Ragavan, Narasimhan

AU - Pollock, Hubert M.

PY - 2007/11

Y1 - 2007/11

N2 - In the biomedical field, infrared (IR) spectroscopic studies can involve the processing of data derived from many samples, divided into classes such as category of tissue (e.g., normal or cancerous) or patient identity. We require reliable methods to identify the class-specific information on which of the wavenumbers, representing various molecular groups, are responsible for observed class groupings. Employing a prostate tissue sample divided into three regions (transition zone, peripheral zone, and adjacent adenocarcinoma), and interrogated using synchrotron Fourier-transform IR microspectroscopy, we compared two statistical methods: (a) a new “cluster vector” version of principal component analysis (PCA) in which the dimensions of the dataset are reduced, followed by linear discriminant analysis (LDA) to reveal clusters, through each of which a vector is constructed that identifies the contributory wavenumbers; and (b) stepwise LDA, which exploits the fact that spectral peaks which identify certain chemical bonds extend over several wavenumbers, and which following classification via either one or two wavenumbers, checks whether the resulting predictions are stable across a range of nearby wavenumbers. Stepwise LDA is the simpler of the two methods; the cluster vector approach can indicate which of the different classes of spectra exhibit the significant differences in signal seen at the “prominent” wavenumbers identified. In situations where IR spectra are found to separate into classes, the excellent agreement between the two quite different methods points to what will prove to be a new and reliable approach to establishing which molecular groups are responsible for such separation.

AB - In the biomedical field, infrared (IR) spectroscopic studies can involve the processing of data derived from many samples, divided into classes such as category of tissue (e.g., normal or cancerous) or patient identity. We require reliable methods to identify the class-specific information on which of the wavenumbers, representing various molecular groups, are responsible for observed class groupings. Employing a prostate tissue sample divided into three regions (transition zone, peripheral zone, and adjacent adenocarcinoma), and interrogated using synchrotron Fourier-transform IR microspectroscopy, we compared two statistical methods: (a) a new “cluster vector” version of principal component analysis (PCA) in which the dimensions of the dataset are reduced, followed by linear discriminant analysis (LDA) to reveal clusters, through each of which a vector is constructed that identifies the contributory wavenumbers; and (b) stepwise LDA, which exploits the fact that spectral peaks which identify certain chemical bonds extend over several wavenumbers, and which following classification via either one or two wavenumbers, checks whether the resulting predictions are stable across a range of nearby wavenumbers. Stepwise LDA is the simpler of the two methods; the cluster vector approach can indicate which of the different classes of spectra exhibit the significant differences in signal seen at the “prominent” wavenumbers identified. In situations where IR spectra are found to separate into classes, the excellent agreement between the two quite different methods points to what will prove to be a new and reliable approach to establishing which molecular groups are responsible for such separation.

KW - adenocarcinoma

KW - biomedical

KW - clustering

KW - LDA

KW - microspectroscopy

KW - misclassification

U2 - 10.1089/cmb.2007.0057

DO - 10.1089/cmb.2007.0057

M3 - Journal article

VL - 14

SP - 1176

EP - 1184

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 9

ER -