Accepted author manuscript, 389 KB, PDF document
Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - FIESTA
T2 - 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
AU - Moss, Henry B.
AU - Moore, Andrew
AU - Leslie, David S.
AU - Rayson, Paul
PY - 2019/7/30
Y1 - 2019/7/30
N2 - We present FIESTA, a model selection approach that significantly reduces the computational resources required to reliably identify state-of-the-art performance from large collections of candidate models. Despite being known to produce unreliable comparisons, it is still common practice to compare model evaluations based on single choices of random seeds. We show that reliable model selection also requires evaluations based on multiple train-test splits (contrary to common practice in many shared tasks). Using bandit theory from the statistics literature, we are able to adaptively determine appropriate numbers of data splits and random seeds used to evaluate each model, focusing computational resources on the evaluation of promising models whilst avoiding wasting evaluations on models with lower performance. Furthermore, our user-friendly Python implementation produces confidence guarantees of correctly selecting the optimal model. We evaluate our algorithms by selecting between 8 target-dependent sentiment analysis methods using dramatically fewer model evaluations than current model selection approaches.
AB - We present FIESTA, a model selection approach that significantly reduces the computational resources required to reliably identify state-of-the-art performance from large collections of candidate models. Despite being known to produce unreliable comparisons, it is still common practice to compare model evaluations based on single choices of random seeds. We show that reliable model selection also requires evaluations based on multiple train-test splits (contrary to common practice in many shared tasks). Using bandit theory from the statistics literature, we are able to adaptively determine appropriate numbers of data splits and random seeds used to evaluate each model, focusing computational resources on the evaluation of promising models whilst avoiding wasting evaluations on models with lower performance. Furthermore, our user-friendly Python implementation produces confidence guarantees of correctly selecting the optimal model. We evaluate our algorithms by selecting between 8 target-dependent sentiment analysis methods using dramatically fewer model evaluations than current model selection approaches.
KW - Machine Learning
KW - Model selection
KW - Multi-armed bandit
KW - NLP
M3 - Conference contribution/Paper
SP - 2920
EP - 2930
BT - Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
PB - Association for Computational Linguistics
CY - Stroudsburg PA
Y2 - 28 July 2019 through 2 August 2019
ER -