Rights statement: This is the author’s version of a work that was accepted for publication in Schizophrenia Research. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Schizophrenia Research 214, 2018 DOI: 10.1016/j.schres.2017.11.038
Accepted author manuscript, 1.11 MB, PDF document
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
<mark>Journal publication date</mark> | 1/12/2019 |
---|---|
<mark>Journal</mark> | Schizophrenia Research |
Volume | 214 |
Number of pages | 8 |
Pages (from-to) | 3-10 |
Publication Status | Published |
Early online date | 21/12/17 |
<mark>Original language</mark> | English |
Machine learning is a powerful tool that has previously been used to classify schizophrenia (SZ) patients from healthy controls (HC) using magnetic resonance images. Each study, however, uses different datasets, classification algorithms, and validation techniques. Here, we perform a critical appraisal of the accuracy of machine learning methodologies used in SZ/HC classifications studies by comparing three machine learning algorithms (logistic regression [LR], support vector machines [SVMs], and linear discriminant analysis [LDA]) on three independent datasets (435 subjects total) using two tissue density estimates and cortical thickness (CT). Performance is assessed using 10-fold cross-validation, as well as a held-out validation set. Classification using CT outperformed tissue densities, but there was no clear effect of dataset. LR, SVMs, and LDA each yielded the highest accuracies for a different feature set and validation paradigm, but most accuracies were between 55 and 70%, well below previously reported values. The highest accuracy achieved was 73.5% using CT data and an SVM. Taken together, these results illustrate some of the obstacles to constructing effective disease classifiers, and suggest that tissue densities and CT may not be sufficiently sensitive for SZ/HC classification given current available methodologies and sample sizes.