Through the Citizen Scientists' Eyes - Research Portal

Physics

Associated organisational unit

Observational Astrophysics

Text available via DOI:

https://doi.org/10.5334/cstp.740
Final published version
Available under license: CC BY: Creative Commons Attribution 4.0 International License

View graph of relations

Through the Citizen Scientists' Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Through the Citizen Scientists' Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data. / Mantha, Kameswara Bharadwaj; Roberts, Hayley; Fortson, Lucy et al.
In: Citizen Science: Theory and Practice, Vol. 9, No. 1, 40, 09.12.2024.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Bibtex

@article{0d22e15bcb994e2d8ebde7e0dd1dd117,

title = "Through the Citizen Scientists' Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data",

abstract = "In the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically interesting objects, a task at which humans excel. Human-in-the-loop (HITL) strategies that combine the strengths of citizen science (CS) and ML offer a promising solution, but first, we need to better understand the relationship between human- and machine-identified samples. In this work, we present a case study from the Galaxy Zoo: Weird & Wonderful project, where volunteers inspected ~200,000 astronomical images—processed by an ML-based anomaly detection model—to identify those with unusual or interesting characteristics. Volunteer-selected images with common astrophysical characteristics had higher consensus, while rarer or more complex ones had lower consensus. This suggests low-consensus choices shouldn{\textquoteright}t be dismissed in further explorations. Additionally, volunteers were better at filtering out uninteresting anomalies, such as image artifacts, which the machine struggled with. We also found that a higher ML-generated anomaly score that indicates images{\textquoteright} low-level feature anomalousness was a better predictor of the volunteers{\textquoteright} consensus choice. Combining a locus of high volunteer-consensus images within the ML learnt feature space and anomaly score, we demonstrated a decision boundary that can effectively isolate images with unusual and potentially scientifically interesting characteristics. Using this case study, we lay important guidelines for future research studies looking to adapt and operationalize human-machine collaborative frameworks for efficient anomaly detection in big data.",

author = "Mantha, {Kameswara Bharadwaj} and Hayley Roberts and Lucy Fortson and Chris Lintott and Hugh Dickinson and William Keel and Ramanakumar Sankar and Coleman Krawczyk and Brooke Simmons and Mike Walmsley and Izzy Garland and Makechemu, {Jason Shingirai} and Laura Trouille and Clifford Johnson",

year = "2024",

month = dec,

day = "9",

doi = "10.5334/cstp.740",

language = "English",

volume = "9",

journal = "Citizen Science: Theory and Practice",

issn = "2057-4991",

publisher = "Ubiquity Press",

number = "1",

}

RIS

TY - JOUR

T1 - Through the Citizen Scientists' Eyes

T2 - Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data

AU - Mantha, Kameswara Bharadwaj

AU - Roberts, Hayley

AU - Fortson, Lucy

AU - Lintott, Chris

AU - Dickinson, Hugh

AU - Keel, William

AU - Sankar, Ramanakumar

AU - Krawczyk, Coleman

AU - Simmons, Brooke

AU - Walmsley, Mike

AU - Garland, Izzy

AU - Makechemu, Jason Shingirai

AU - Trouille, Laura

AU - Johnson, Clifford

PY - 2024/12/9

Y1 - 2024/12/9

N2 - In the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically interesting objects, a task at which humans excel. Human-in-the-loop (HITL) strategies that combine the strengths of citizen science (CS) and ML offer a promising solution, but first, we need to better understand the relationship between human- and machine-identified samples. In this work, we present a case study from the Galaxy Zoo: Weird & Wonderful project, where volunteers inspected ~200,000 astronomical images—processed by an ML-based anomaly detection model—to identify those with unusual or interesting characteristics. Volunteer-selected images with common astrophysical characteristics had higher consensus, while rarer or more complex ones had lower consensus. This suggests low-consensus choices shouldn’t be dismissed in further explorations. Additionally, volunteers were better at filtering out uninteresting anomalies, such as image artifacts, which the machine struggled with. We also found that a higher ML-generated anomaly score that indicates images’ low-level feature anomalousness was a better predictor of the volunteers’ consensus choice. Combining a locus of high volunteer-consensus images within the ML learnt feature space and anomaly score, we demonstrated a decision boundary that can effectively isolate images with unusual and potentially scientifically interesting characteristics. Using this case study, we lay important guidelines for future research studies looking to adapt and operationalize human-machine collaborative frameworks for efficient anomaly detection in big data.

AB - In the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically interesting objects, a task at which humans excel. Human-in-the-loop (HITL) strategies that combine the strengths of citizen science (CS) and ML offer a promising solution, but first, we need to better understand the relationship between human- and machine-identified samples. In this work, we present a case study from the Galaxy Zoo: Weird & Wonderful project, where volunteers inspected ~200,000 astronomical images—processed by an ML-based anomaly detection model—to identify those with unusual or interesting characteristics. Volunteer-selected images with common astrophysical characteristics had higher consensus, while rarer or more complex ones had lower consensus. This suggests low-consensus choices shouldn’t be dismissed in further explorations. Additionally, volunteers were better at filtering out uninteresting anomalies, such as image artifacts, which the machine struggled with. We also found that a higher ML-generated anomaly score that indicates images’ low-level feature anomalousness was a better predictor of the volunteers’ consensus choice. Combining a locus of high volunteer-consensus images within the ML learnt feature space and anomaly score, we demonstrated a decision boundary that can effectively isolate images with unusual and potentially scientifically interesting characteristics. Using this case study, we lay important guidelines for future research studies looking to adapt and operationalize human-machine collaborative frameworks for efficient anomaly detection in big data.

U2 - 10.5334/cstp.740

DO - 10.5334/cstp.740

M3 - Journal article

VL - 9

JO - Citizen Science: Theory and Practice

JF - Citizen Science: Theory and Practice

SN - 2057-4991

IS - 1

M1 - 40

ER -

Research

Associated organisational unit

Links

Text available via DOI:

Through the Citizen Scientists' Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us