Assessing crowdsourcing quality through objective tasks

Computing and Communications

Associated organisational unit

UCREL - University Centre for Computer Corpus Research on Language

Keywords

Mechanical Turk, Objective Metrics, Evaluation

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Assessing crowdsourcing quality through objective tasks. / Aker, Ahmet; El-Haj, Mahmoud; Albakour, M-Dyaa et al.
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey: European Language Resources Association (ELRA), 2012. p. 1456-1461.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Aker, A, El-Haj, M, Albakour, M-D & Kruschwitz, U 2012, Assessing crowdsourcing quality through objective tasks. in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). European Language Resources Association (ELRA), Istanbul, Turkey, pp. 1456-1461. <http://www.lrec-conf.org/proceedings/lrec2012/index.html>

APA

Aker, A., El-Haj, M., Albakour, M.-D., & Kruschwitz, U. (2012). Assessing crowdsourcing quality through objective tasks. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) (pp. 1456-1461). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2012/index.html

Vancouver

Aker A, El-Haj M, Albakour MD, Kruschwitz U. Assessing crowdsourcing quality through objective tasks. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey: European Language Resources Association (ELRA). 2012. p. 1456-1461

Author

Aker, Ahmet ; El-Haj, Mahmoud ; Albakour, M-Dyaa et al. / Assessing crowdsourcing quality through objective tasks. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey : European Language Resources Association (ELRA), 2012. pp. 1456-1461

Bibtex

@inproceedings{0ca25cb5e2224846816a038aa32e18a5,

title = "Assessing crowdsourcing quality through objective tasks",

abstract = "The emergence of crowdsourcing as a commonly used approach to collect vast quantities of human assessments on a variety of tasks represents nothing less than a paradigm shift. This is particularly true in academic research where it has suddenly become possible to collect (high-quality) annotations rapidly without the need of an expert. In this paper we investigate factors which can influence the quality of the results obtained through Amazon's Mechanical Turk crowdsourcing platform. We investigated the impact of different presentation methods (free text versus radio buttons), workers' base (USA versus India as the main bases of MTurk workers) and payment scale (about $4, $8 and $10 per hour) on the quality of the results. For each run we assessed the results provided by 25 workers on a set of 10 tasks. We run two different experiments using objective tasks: maths and general text questions. In both tasks the answers are unique, which eliminates the uncertainty usually present in subjective tasks, where it is not clear whether the unexpected answer is caused by a lack of worker's motivation, the worker's interpretation of the task or genuine ambiguity. In this work we present our results comparing the influence of the different factors used. One of the interesting findings is that our results do not confirm previous studies which concluded that an increase in payment attracts more noise. We also find that the country of origin only has an impact in some of the categories and only in general text questions but there is no significant difference at the top pay.",

keywords = "Mechanical Turk, Objective Metrics, Evaluation",

author = "Ahmet Aker and Mahmoud El-Haj and M-Dyaa Albakour and Udo Kruschwitz",

year = "2012",

language = "English",

isbn = "9782951740877",

pages = "1456--1461",

booktitle = "Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)",

publisher = "European Language Resources Association (ELRA)",

}

RIS

TY - GEN

T1 - Assessing crowdsourcing quality through objective tasks

AU - Aker, Ahmet

AU - El-Haj, Mahmoud

AU - Albakour, M-Dyaa

AU - Kruschwitz, Udo

PY - 2012

Y1 - 2012

N2 - The emergence of crowdsourcing as a commonly used approach to collect vast quantities of human assessments on a variety of tasks represents nothing less than a paradigm shift. This is particularly true in academic research where it has suddenly become possible to collect (high-quality) annotations rapidly without the need of an expert. In this paper we investigate factors which can influence the quality of the results obtained through Amazon's Mechanical Turk crowdsourcing platform. We investigated the impact of different presentation methods (free text versus radio buttons), workers' base (USA versus India as the main bases of MTurk workers) and payment scale (about $4, $8 and $10 per hour) on the quality of the results. For each run we assessed the results provided by 25 workers on a set of 10 tasks. We run two different experiments using objective tasks: maths and general text questions. In both tasks the answers are unique, which eliminates the uncertainty usually present in subjective tasks, where it is not clear whether the unexpected answer is caused by a lack of worker's motivation, the worker's interpretation of the task or genuine ambiguity. In this work we present our results comparing the influence of the different factors used. One of the interesting findings is that our results do not confirm previous studies which concluded that an increase in payment attracts more noise. We also find that the country of origin only has an impact in some of the categories and only in general text questions but there is no significant difference at the top pay.

AB - The emergence of crowdsourcing as a commonly used approach to collect vast quantities of human assessments on a variety of tasks represents nothing less than a paradigm shift. This is particularly true in academic research where it has suddenly become possible to collect (high-quality) annotations rapidly without the need of an expert. In this paper we investigate factors which can influence the quality of the results obtained through Amazon's Mechanical Turk crowdsourcing platform. We investigated the impact of different presentation methods (free text versus radio buttons), workers' base (USA versus India as the main bases of MTurk workers) and payment scale (about $4, $8 and $10 per hour) on the quality of the results. For each run we assessed the results provided by 25 workers on a set of 10 tasks. We run two different experiments using objective tasks: maths and general text questions. In both tasks the answers are unique, which eliminates the uncertainty usually present in subjective tasks, where it is not clear whether the unexpected answer is caused by a lack of worker's motivation, the worker's interpretation of the task or genuine ambiguity. In this work we present our results comparing the influence of the different factors used. One of the interesting findings is that our results do not confirm previous studies which concluded that an increase in payment attracts more noise. We also find that the country of origin only has an impact in some of the categories and only in general text questions but there is no significant difference at the top pay.

KW - Mechanical Turk

KW - Objective Metrics

KW - Evaluation

M3 - Conference contribution/Paper

SN - 9782951740877

SP - 1456

EP - 1461

BT - Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)

PB - European Language Resources Association (ELRA)

CY - Istanbul, Turkey

ER -

Research

Associated organisational unit

Links

Keywords