Using J-K-fold Cross Validation to Reduce Variance When Tuning NLP Models

Associated organisational units

Electronic data

coling2018
Accepted author manuscript, 453 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

E-pub ahead of print

Publication date	06/2018
Host publication	Proceedings of COLING 2018
Publisher	Association for Computational Linguistics (ACL Anthology)
Pages	2978–2989
Number of pages	12
<mark>Original language</mark>	English
Event	Conference on Computational Linguistics - Santa Fe Community Convention Center, Santa Fe, United States Duration: 20/08/2018 → 26/08/2018 Conference number: 27 https://coling2018.org/

Conference

Conference	Conference on Computational Linguistics
Abbreviated title	COLING
Country/Territory	United States
City	Santa Fe
Period	20/08/18 → 26/08/18
Internet address	https://coling2018.org/

Publication series

Name	Proceedings of COLING 2018
Publisher	Association for Computational Linguistics
ISSN (Print)	1525-2477

Conference

Conference	Conference on Computational Linguistics
Abbreviated title	COLING
Country/Territory	United States
City	Santa Fe
Period	20/08/18 → 26/08/18
Internet address	https://coling2018.org/

Abstract

K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning. However, the very process of CV requires random partitioning of the data and so our performance estimates are in fact stochastic, with variability that can be substantial for natural language processing tasks. We demonstrate that these unstable estimates cannot be relied upon for effective parameter tuning. The resulting tuned parameters are highly sensitive to how our data is partitioned, meaning that we often select sub-optimal parameter choices and have serious reproducibility issues.

Instead, we propose to use the less variable J-K-fold CV, in which J independent K-fold cross validations are used to assess performance. Our main contributions are extending J-K-fold CV from performance estimation to parameter tuning and investigating how to choose J and K. We argue that variability is more important than bias for effective tuning and so advocate lower choices of K than are typically seen in the NLP literature, instead use the saved computation to increase J. To demonstrate the generality of our recommendations we investigate a wide range of case-studies: sentiment classification (both general and target-specific), part-of-speech tagging and document classification.

Bibliographic note

COLING 2018. Code available at: https://github.com/henrymoss/COLING2018

Research

Associated organisational units

Electronic data

Links