Rights statement: This is the author’s version of a work that was accepted for publication in International Journal of Forecasting. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in International Journal of Forecasting, 38, 4, 2021 DOI: 10.1016/j.ijforecast.2021.09.002
Accepted author manuscript, 1.11 MB, PDF document
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
<mark>Journal publication date</mark> | 31/10/2022 |
---|---|
<mark>Journal</mark> | International Journal of Forecasting |
Issue number | 4 |
Volume | 38 |
Number of pages | 8 |
Pages (from-to) | 1492-1499 |
Publication Status | Published |
Early online date | 5/10/22 |
<mark>Original language</mark> | English |
The M5 accuracy competition has presented a large-scale hierarchical forecasting problem in a realistic grocery retail setting in order to evaluate an extended range of forecasting methods, particularly those adopting machine learning. The top ranking solutions adopted a global bottom-up approach, by which is meant using global forecasting methods to generate bottom level forecasts in the hierarchy and then using a bottom-up strategy to obtain coherent forecasts for aggregate levels. However, whether the observed superior performance of the global bottom-up approach is robust over various test periods or only an accidental result, is an important question for retail forecasting researchers and practitioners. We conduct experiments to explore the robustness of the global bottom-up approach, and make comments on the efforts made by the top-ranking teams to improve the core approach. We find that the top-ranking global bottom-up approaches lack robustness across time periods in the M5 data. This inconsistent performance makes the M5 final rankings somewhat of a lottery. In future forecasting competitions, we suggest the use of multiple rolling test sets to evaluate the forecasting performance in order to reward robustly performing forecasting methods, a much needed characteristic in any application.