Feature selection using stochastic approximation with Barzilai and Borwein non-monotone gains

Management Science

Text available via DOI:

https://doi.org/10.1016/j.cor.2021.105334
Final published version

Keywords

Barzilai and Borwein method, Explainable artificial intelligence, Feature selection, Genetic algorithm, Gradient descent, Stochastic approximation

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Feature selection using stochastic approximation with Barzilai and Borwein non-monotone gains. / Aksakalli, Vural; D. Yenice, Zeren; Malekipirbazari, Milad et al.
In: Computers and Operations Research, Vol. 132, 105334, 31.08.2021.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Aksakalli, V, D. Yenice, Z, Malekipirbazari, M & Kargar, K 2021, 'Feature selection using stochastic approximation with Barzilai and Borwein non-monotone gains', Computers and Operations Research, vol. 132, 105334. https://doi.org/10.1016/j.cor.2021.105334

APA

Aksakalli, V., D. Yenice, Z., Malekipirbazari, M., & Kargar, K. (2021). Feature selection using stochastic approximation with Barzilai and Borwein non-monotone gains. Computers and Operations Research, 132, Article 105334. https://doi.org/10.1016/j.cor.2021.105334

Vancouver

Aksakalli V, D. Yenice Z, Malekipirbazari M, Kargar K. Feature selection using stochastic approximation with Barzilai and Borwein non-monotone gains. Computers and Operations Research. 2021 Aug 31;132:105334. Epub 2021 Apr 22. doi: 10.1016/j.cor.2021.105334

Author

Aksakalli, Vural ; D. Yenice, Zeren ; Malekipirbazari, Milad et al. / Feature selection using stochastic approximation with Barzilai and Borwein non-monotone gains. In: Computers and Operations Research. 2021 ; Vol. 132.

Bibtex

@article{81847f4bfdb7475598656c8481a57e5c,

title = "Feature selection using stochastic approximation with Barzilai and Borwein non-monotone gains",

abstract = "With recent emergence of machine learning problems with massive number of features, feature selection (FS) has become an ever-increasingly important tool to mitigate the effects of the so-called curse of dimensionality. FS aims to eliminate redundant and irrelevant features for models that are faster to train, easier to understand, and less prone to overfitting. This study presents a wrapper FS method based on Simultaneous Perturbation Stochastic Approximation (SPSA) with Barzilai and Borwein (BB) non-monotone gains within a pseudo-gradient descent framework wherein performance is measured via cross-validation. We illustrate that SPSA with BB gains (SPSA-BB) provides dramatic improvements in terms of the number of iterations for convergence with minimal degradation in cross-validated error performance over the current state-of-the art approach with monotone gains (SPSA-MON). In addition, SPSA-BB requires only one internal parameter and therefore it eliminates the need for careful fine-tuning of numerous other internal parameters as in SPSA-MON or comparable meta-heuristic FS methods such as genetic algorithms (GA). Our particular implementation includes gradient averaging as well as gain smoothing for better convergence properties. We present computational experiments on various public datasets with Nearest Neighbors and Naive Bayes classifiers as wrappers. We present comparisons of SPSA-BB against full set of features, SPSA-MON, as well as seven popular meta-heuristics based FS algorithms including GA and particle swarm optimization. Our results indicate that SPSA-BB converges to a good feature set in about 50 iterations on the average regardless of the number of features (whether a dozen or more than 1000 features) and its performance is quite competitive. SPSA-BB can be considered extremely fast for a wrapper method and therefore it stands as a high-performing new feature selection method that is also computationally feasible in practice.",

keywords = "Barzilai and Borwein method, Explainable artificial intelligence, Feature selection, Genetic algorithm, Gradient descent, Stochastic approximation",

author = "Vural Aksakalli and {D. Yenice}, Zeren and Milad Malekipirbazari and Kamyar Kargar",

year = "2021",

month = aug,

day = "31",

doi = "10.1016/j.cor.2021.105334",

language = "English",

volume = "132",

journal = "Computers and Operations Research",

issn = "0305-0548",

publisher = "Elsevier Ltd",

}

RIS

TY - JOUR

T1 - Feature selection using stochastic approximation with Barzilai and Borwein non-monotone gains

AU - Aksakalli, Vural

AU - D. Yenice, Zeren

AU - Malekipirbazari, Milad

AU - Kargar, Kamyar

PY - 2021/8/31

Y1 - 2021/8/31

N2 - With recent emergence of machine learning problems with massive number of features, feature selection (FS) has become an ever-increasingly important tool to mitigate the effects of the so-called curse of dimensionality. FS aims to eliminate redundant and irrelevant features for models that are faster to train, easier to understand, and less prone to overfitting. This study presents a wrapper FS method based on Simultaneous Perturbation Stochastic Approximation (SPSA) with Barzilai and Borwein (BB) non-monotone gains within a pseudo-gradient descent framework wherein performance is measured via cross-validation. We illustrate that SPSA with BB gains (SPSA-BB) provides dramatic improvements in terms of the number of iterations for convergence with minimal degradation in cross-validated error performance over the current state-of-the art approach with monotone gains (SPSA-MON). In addition, SPSA-BB requires only one internal parameter and therefore it eliminates the need for careful fine-tuning of numerous other internal parameters as in SPSA-MON or comparable meta-heuristic FS methods such as genetic algorithms (GA). Our particular implementation includes gradient averaging as well as gain smoothing for better convergence properties. We present computational experiments on various public datasets with Nearest Neighbors and Naive Bayes classifiers as wrappers. We present comparisons of SPSA-BB against full set of features, SPSA-MON, as well as seven popular meta-heuristics based FS algorithms including GA and particle swarm optimization. Our results indicate that SPSA-BB converges to a good feature set in about 50 iterations on the average regardless of the number of features (whether a dozen or more than 1000 features) and its performance is quite competitive. SPSA-BB can be considered extremely fast for a wrapper method and therefore it stands as a high-performing new feature selection method that is also computationally feasible in practice.

AB - With recent emergence of machine learning problems with massive number of features, feature selection (FS) has become an ever-increasingly important tool to mitigate the effects of the so-called curse of dimensionality. FS aims to eliminate redundant and irrelevant features for models that are faster to train, easier to understand, and less prone to overfitting. This study presents a wrapper FS method based on Simultaneous Perturbation Stochastic Approximation (SPSA) with Barzilai and Borwein (BB) non-monotone gains within a pseudo-gradient descent framework wherein performance is measured via cross-validation. We illustrate that SPSA with BB gains (SPSA-BB) provides dramatic improvements in terms of the number of iterations for convergence with minimal degradation in cross-validated error performance over the current state-of-the art approach with monotone gains (SPSA-MON). In addition, SPSA-BB requires only one internal parameter and therefore it eliminates the need for careful fine-tuning of numerous other internal parameters as in SPSA-MON or comparable meta-heuristic FS methods such as genetic algorithms (GA). Our particular implementation includes gradient averaging as well as gain smoothing for better convergence properties. We present computational experiments on various public datasets with Nearest Neighbors and Naive Bayes classifiers as wrappers. We present comparisons of SPSA-BB against full set of features, SPSA-MON, as well as seven popular meta-heuristics based FS algorithms including GA and particle swarm optimization. Our results indicate that SPSA-BB converges to a good feature set in about 50 iterations on the average regardless of the number of features (whether a dozen or more than 1000 features) and its performance is quite competitive. SPSA-BB can be considered extremely fast for a wrapper method and therefore it stands as a high-performing new feature selection method that is also computationally feasible in practice.

KW - Barzilai and Borwein method

KW - Explainable artificial intelligence

KW - Feature selection

KW - Genetic algorithm

KW - Gradient descent

KW - Stochastic approximation

U2 - 10.1016/j.cor.2021.105334

DO - 10.1016/j.cor.2021.105334

M3 - Journal article

AN - SCOPUS:85104730087

VL - 132

JO - Computers and Operations Research

JF - Computers and Operations Research

SN - 0305-0548

M1 - 105334

ER -

Research

Links

Text available via DOI:

Keywords