Using JK-fold Cross Validation To Reduce Variance When Tuning NLP Models

Associated organisational units

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Using JK-fold Cross Validation To Reduce Variance When Tuning NLP Models. / Moss, Henry ; Leslie, David ; Rayson, Paul.
Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe: Association for Computational Linguistics, 2018. p. 2978-2989.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Moss, H , Leslie, D & Rayson, P 2018, Using JK-fold Cross Validation To Reduce Variance When Tuning NLP Models. in Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, pp. 2978-2989, Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, United States, 1/08/18. <https://aclanthology.org/C18-1252/>

APA

Moss, H., Leslie, D., & Rayson, P. (2018). Using JK-fold Cross Validation To Reduce Variance When Tuning NLP Models. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 2978-2989). Association for Computational Linguistics. https://aclanthology.org/C18-1252/

Vancouver

Moss H , Leslie D , Rayson P. Using JK-fold Cross Validation To Reduce Variance When Tuning NLP Models. In Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe: Association for Computational Linguistics. 2018. p. 2978-2989

Author

Moss, Henry ; Leslie, David ; Rayson, Paul. / Using JK-fold Cross Validation To Reduce Variance When Tuning NLP Models. Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe : Association for Computational Linguistics, 2018. pp. 2978-2989

Bibtex

@inproceedings{613922c5859e4f949e12d68f30e29fd6,

title = "Using JK-fold Cross Validation To Reduce Variance When Tuning NLP Models",

abstract = "K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning. However, the very process of CV requires random partitioning of the data and so our performance estimates are in fact stochastic, with variability that can be substantial for natural language processing tasks. We demonstrate that these unstable estimates cannot be relied upon for effective parameter tuning. The resulting tuned parameters are highly sensitive to how our data is partitioned, meaning that we often select sub-optimal parameter choices and have serious reproducibility issues. Instead, we propose to use the less variable J-K-fold CV, in which J independent K-fold cross validations are used to assess performance. Our main contributions are extending J-K-fold CV from performance estimation to parameter tuning and investigating how to choose J and K. We argue that variability is more important than bias for effective tuning and so advocate lower choices of K than are typically seen in the NLP literature and instead use the saved computation to increase J. To demonstrate the generality of our recommendations we investigate a wide range of case-studies: sentiment classification (both general and target-specific), part-of-speech tagging and document classification.",

author = "Henry Moss and David Leslie and Paul Rayson",

year = "2018",

month = aug,

day = "31",

language = "Undefined/Unknown",

pages = "2978--2989",

booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",

publisher = "Association for Computational Linguistics",

note = "Proceedings of the 27th International Conference on Computational Linguistics ; Conference date: 01-08-2018",

url = "https://aclanthology.org/volumes/C18-1/",

}

RIS

TY - GEN

T1 - Using JK-fold Cross Validation To Reduce Variance When Tuning NLP Models

AU - Moss, Henry

AU - Leslie, David

AU - Rayson, Paul

PY - 2018/8/31

Y1 - 2018/8/31

N2 - K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning. However, the very process of CV requires random partitioning of the data and so our performance estimates are in fact stochastic, with variability that can be substantial for natural language processing tasks. We demonstrate that these unstable estimates cannot be relied upon for effective parameter tuning. The resulting tuned parameters are highly sensitive to how our data is partitioned, meaning that we often select sub-optimal parameter choices and have serious reproducibility issues. Instead, we propose to use the less variable J-K-fold CV, in which J independent K-fold cross validations are used to assess performance. Our main contributions are extending J-K-fold CV from performance estimation to parameter tuning and investigating how to choose J and K. We argue that variability is more important than bias for effective tuning and so advocate lower choices of K than are typically seen in the NLP literature and instead use the saved computation to increase J. To demonstrate the generality of our recommendations we investigate a wide range of case-studies: sentiment classification (both general and target-specific), part-of-speech tagging and document classification.

AB - K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning. However, the very process of CV requires random partitioning of the data and so our performance estimates are in fact stochastic, with variability that can be substantial for natural language processing tasks. We demonstrate that these unstable estimates cannot be relied upon for effective parameter tuning. The resulting tuned parameters are highly sensitive to how our data is partitioned, meaning that we often select sub-optimal parameter choices and have serious reproducibility issues. Instead, we propose to use the less variable J-K-fold CV, in which J independent K-fold cross validations are used to assess performance. Our main contributions are extending J-K-fold CV from performance estimation to parameter tuning and investigating how to choose J and K. We argue that variability is more important than bias for effective tuning and so advocate lower choices of K than are typically seen in the NLP literature and instead use the saved computation to increase J. To demonstrate the generality of our recommendations we investigate a wide range of case-studies: sentiment classification (both general and target-specific), part-of-speech tagging and document classification.

M3 - Conference contribution/Paper

SP - 2978

EP - 2989

BT - Proceedings of the 27th International Conference on Computational Linguistics

PB - Association for Computational Linguistics

CY - Santa Fe

T2 - Proceedings of the 27th International Conference on Computational Linguistics

Y2 - 1 August 2018

ER -

Research

Associated organisational units

Links