Home > Research > Publications & Outputs > Multivariate response predictor selection methods

Electronic data

  • 2019lowtherphd

    Final published version, 3.06 MB, PDF document

    Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

View graph of relations

Multivariate response predictor selection methods: with applications to telecommunications time series data

Research output: ThesisDoctoral Thesis

Published
Publication date13/02/2020
Number of pages169
QualificationPhD
Awarding Institution
Supervisors/Advisors
Thesis sponsors
  • Engineering and Physical Sciences Research Council (EPSRC)
  • BT Applied Research
Award date8/02/2020
Publisher
  • Lancaster University
Original languageEnglish

Abstract

This thesis looks at developing a semi-automated approach to estimate multiple, sparse, linear regression models simultaneously. We are motivated by a telecommunications application and aim to produce interpretable models.

Firstly, we generalise the best-subset problem which is often used to estimate sparse linear regression models. We call our problem the Simultaneous Best-Subset (SBS) problem and use it to simultaneously estimate multiple linear regression models. The so-called SBS approach
produces models that perform more favorably in comparison to models estimated individually using the best-subset approach. We solve the SBS problem by formulating a Mixed Integer Quadratic Optimisation (MIQO) program which can often be solved quickly using an optimisation solver. The
MIQO framework allows us to have some control over the regression models estimated which is desirable in an automated setting.

Secondly, we propose a simultaneous shrinkage operator. This operator shrinks coefficients between models towards a common value. We show that this operator can further improve parameter estimation when simultaneously estimating multiple linear regression models. This operator was found to be particularly useful when noisy predictors entered the models.

Thirdly, we show how the SBS approach can be integrated into a two-step semi-automated procedure for fitting REGression Seasonal AutoRegressive Integrated Moving Average (Reg-SARIMA) models. We apply this automated approach to estimate models for a telecommunications dataset and compare it to the current approach employed by our industrial collaborator. We show how the Reg-SARIMA models provide a better fit to the data, are more interpretable, and perform more favourably for future short-term predictions. In addition to this, the two-step procedure requires much less human intervention into the modelling procedure than procedures currently used by industry.

Finally, we propose fast approaches to simultaneously estimate multiple sparse linear regression models. Using a simulation study we show that these approaches often produce models that perform as favorably as the SBS approach, despite producing models in far less time.