Accepted author manuscript, 3.17 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Integrating human and machine intelligence in galaxy morphology classification tasks. / Beck, Melanie R; Scarlata, Claudia; Fortson, Lucy F et al.
In: Monthly Notices of the Royal Astronomical Society, Vol. 476, No. 4, 01.06.2018, p. 5516-5534.Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Integrating human and machine intelligence in galaxy morphology classification tasks
AU - Beck, Melanie R
AU - Scarlata, Claudia
AU - Fortson, Lucy F
AU - Lintott, Chris J
AU - Simmons, B D
AU - Galloway, Melanie A
AU - Willett, Kyle W
AU - Dickinson, Hugh
AU - Masters, Karen L
AU - Marshall, Philip J
AU - Wright, Darryl
N1 - This is a pre-copy-editing, author-produced PDF of an article accepted for publication Monthly Notices of the Royal Astronomical Society following peer review
PY - 2018/6/1
Y1 - 2018/6/1
N2 - Quantifying galaxy morphology is a challenging yet scientifically rewarding task. As the scale of data continues to increase with upcoming surveys, traditional classification methods will struggle to handle the load. We present a solution through an integration of visual and automated classifications, preserving the best features of both human and machine. We demonstrate the effectiveness of such a system through a re-analysis of visual galaxy morphology classifications collected during the Galaxy Zoo 2 (GZ2) project. We reprocess the top-level question of the GZ2 decision tree with a Bayesian classification aggregation algorithm dubbed SWAP, originally developed for the Space Warps gravitational lens project. Through a simple binary classification scheme, we increase the classification rate nearly 5-fold classifying 226 124 galaxies in 92 d of GZ2 project time while reproducing labels derived from GZ2 classification data with 95.7 per cent accuracy. We next combine this with a Random Forest machine learning algorithm that learns on a suite of non-parametric morphology indicators widely used for automated morphologies. We develop a decision engine that delegates tasks between human and machine and demonstrate that the combined system provides at least a factor of 8 increase in the classification rate, classifying 210 803 galaxies in just 32 d of GZ2 project time with 93.1 per cent accuracy. As the Random Forest algorithm requires a minimal amount of computational cost, this result has important implications for galaxy morphology identification tasks in the era of Euclid and other large-scale surveys.
AB - Quantifying galaxy morphology is a challenging yet scientifically rewarding task. As the scale of data continues to increase with upcoming surveys, traditional classification methods will struggle to handle the load. We present a solution through an integration of visual and automated classifications, preserving the best features of both human and machine. We demonstrate the effectiveness of such a system through a re-analysis of visual galaxy morphology classifications collected during the Galaxy Zoo 2 (GZ2) project. We reprocess the top-level question of the GZ2 decision tree with a Bayesian classification aggregation algorithm dubbed SWAP, originally developed for the Space Warps gravitational lens project. Through a simple binary classification scheme, we increase the classification rate nearly 5-fold classifying 226 124 galaxies in 92 d of GZ2 project time while reproducing labels derived from GZ2 classification data with 95.7 per cent accuracy. We next combine this with a Random Forest machine learning algorithm that learns on a suite of non-parametric morphology indicators widely used for automated morphologies. We develop a decision engine that delegates tasks between human and machine and demonstrate that the combined system provides at least a factor of 8 increase in the classification rate, classifying 210 803 galaxies in just 32 d of GZ2 project time with 93.1 per cent accuracy. As the Random Forest algorithm requires a minimal amount of computational cost, this result has important implications for galaxy morphology identification tasks in the era of Euclid and other large-scale surveys.
U2 - 10.1093/mnras/sty503
DO - 10.1093/mnras/sty503
M3 - Journal article
VL - 476
SP - 5516
EP - 5534
JO - Monthly Notices of the Royal Astronomical Society
JF - Monthly Notices of the Royal Astronomical Society
SN - 0035-8711
IS - 4
ER -