Final published version, 3.06 MB, PDF document
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
Research output: Thesis › Doctoral Thesis
Research output: Thesis › Doctoral Thesis
}
TY - BOOK
T1 - Multivariate response predictor selection methods
T2 - with applications to telecommunications time series data
AU - Lowther, Aaron
PY - 2020/2/13
Y1 - 2020/2/13
N2 - This thesis looks at developing a semi-automated approach to estimate multiple, sparse, linear regression models simultaneously. We are motivated by a telecommunications application and aim to produce interpretable models.Firstly, we generalise the best-subset problem which is often used to estimate sparse linear regression models. We call our problem the Simultaneous Best-Subset (SBS) problem and use it to simultaneously estimate multiple linear regression models. The so-called SBS approachproduces models that perform more favorably in comparison to models estimated individually using the best-subset approach. We solve the SBS problem by formulating a Mixed Integer Quadratic Optimisation (MIQO) program which can often be solved quickly using an optimisation solver. TheMIQO framework allows us to have some control over the regression models estimated which is desirable in an automated setting.Secondly, we propose a simultaneous shrinkage operator. This operator shrinks coefficients between models towards a common value. We show that this operator can further improve parameter estimation when simultaneously estimating multiple linear regression models. This operator was found to be particularly useful when noisy predictors entered the models.Thirdly, we show how the SBS approach can be integrated into a two-step semi-automated procedure for fitting REGression Seasonal AutoRegressive Integrated Moving Average (Reg-SARIMA) models. We apply this automated approach to estimate models for a telecommunications dataset and compare it to the current approach employed by our industrial collaborator. We show how the Reg-SARIMA models provide a better fit to the data, are more interpretable, and perform more favourably for future short-term predictions. In addition to this, the two-step procedure requires much less human intervention into the modelling procedure than procedures currently used by industry.Finally, we propose fast approaches to simultaneously estimate multiple sparse linear regression models. Using a simulation study we show that these approaches often produce models that perform as favorably as the SBS approach, despite producing models in far less time.
AB - This thesis looks at developing a semi-automated approach to estimate multiple, sparse, linear regression models simultaneously. We are motivated by a telecommunications application and aim to produce interpretable models.Firstly, we generalise the best-subset problem which is often used to estimate sparse linear regression models. We call our problem the Simultaneous Best-Subset (SBS) problem and use it to simultaneously estimate multiple linear regression models. The so-called SBS approachproduces models that perform more favorably in comparison to models estimated individually using the best-subset approach. We solve the SBS problem by formulating a Mixed Integer Quadratic Optimisation (MIQO) program which can often be solved quickly using an optimisation solver. TheMIQO framework allows us to have some control over the regression models estimated which is desirable in an automated setting.Secondly, we propose a simultaneous shrinkage operator. This operator shrinks coefficients between models towards a common value. We show that this operator can further improve parameter estimation when simultaneously estimating multiple linear regression models. This operator was found to be particularly useful when noisy predictors entered the models.Thirdly, we show how the SBS approach can be integrated into a two-step semi-automated procedure for fitting REGression Seasonal AutoRegressive Integrated Moving Average (Reg-SARIMA) models. We apply this automated approach to estimate models for a telecommunications dataset and compare it to the current approach employed by our industrial collaborator. We show how the Reg-SARIMA models provide a better fit to the data, are more interpretable, and perform more favourably for future short-term predictions. In addition to this, the two-step procedure requires much less human intervention into the modelling procedure than procedures currently used by industry.Finally, we propose fast approaches to simultaneously estimate multiple sparse linear regression models. Using a simulation study we show that these approaches often produce models that perform as favorably as the SBS approach, despite producing models in far less time.
KW - predictor selection
KW - time series
KW - multivariate analysis
KW - multivariate linear regression
KW - SARIMA
KW - optimization
U2 - 10.17635/lancaster/thesis/873
DO - 10.17635/lancaster/thesis/873
M3 - Doctoral Thesis
PB - Lancaster University
ER -