Home > Research > Publications & Outputs > Integrating human and machine intelligence in g...

Associated organisational unit

Electronic data

  • 1802.08713

    Accepted author manuscript, 3.17 MB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License


Text available via DOI:

View graph of relations

Integrating human and machine intelligence in galaxy morphology classification tasks

Research output: Contribution to Journal/MagazineJournal articlepeer-review

  • Melanie R Beck
  • Claudia Scarlata
  • Lucy F Fortson
  • Chris J Lintott
  • B D Simmons
  • Melanie A Galloway
  • Kyle W Willett
  • Hugh Dickinson
  • Karen L Masters
  • Philip J Marshall
  • Darryl Wright
<mark>Journal publication date</mark>1/06/2018
<mark>Journal</mark>Monthly Notices of the Royal Astronomical Society
Issue number4
Number of pages19
Pages (from-to)5516-5534
Publication StatusPublished
<mark>Original language</mark>English


Quantifying galaxy morphology is a challenging yet scientifically rewarding task. As the scale of data continues to increase with upcoming surveys, traditional classification methods will struggle to handle the load. We present a solution through an integration of visual and automated classifications, preserving the best features of both human and machine. We demonstrate the effectiveness of such a system through a re-analysis of visual galaxy morphology classifications collected during the Galaxy Zoo 2 (GZ2) project. We reprocess the top-level question of the GZ2 decision tree with a Bayesian classification aggregation algorithm dubbed SWAP, originally developed for the Space Warps gravitational lens project. Through a simple binary classification scheme, we increase the classification rate nearly 5-fold classifying 226 124 galaxies in 92 d of GZ2 project time while reproducing labels derived from GZ2 classification data with 95.7 per cent accuracy. We next combine this with a Random Forest machine learning algorithm that learns on a suite of non-parametric morphology indicators widely used for automated morphologies. We develop a decision engine that delegates tasks between human and machine and demonstrate that the combined system provides at least a factor of 8 increase in the classification rate, classifying 210 803 galaxies in just 32 d of GZ2 project time with 93.1 per cent accuracy. As the Random Forest algorithm requires a minimal amount of computational cost, this result has important implications for galaxy morphology identification tasks in the era of Euclid and other large-scale surveys.

Bibliographic note

This is a pre-copy-editing, author-produced PDF of an article accepted for publication Monthly Notices of the Royal Astronomical Society following peer review