Sentiment Analysis based Error Detection for Large-Scale Systems

Computing and Communications

Associated organisational units

Electronic data

DSN2021_Sentiment_Analysis_Model_For_Errors_Detection_In_Large_Scale_Systems
Accepted author manuscript, 1.65 MB, PDF document

View graph of relations

Research output: Contribution to conference - Without ISBN/ISSN › Conference paper › peer-review

Published

Standard

Sentiment Analysis based Error Detection for Large-Scale Systems. / Alharthi, Khalid; Jhumka, Arshad; Di, Sheng et al.
2021. Paper presented at The 51st IEEE/IFIP International Conference on Dependable Systems and Networks, Taipei, Taiwan, Province of China.

Research output: Contribution to conference - Without ISBN/ISSN › Conference paper › peer-review

Harvard

Alharthi, K, Jhumka, A, Di, S, Cappello, F & Chuah, E 2021, 'Sentiment Analysis based Error Detection for Large-Scale Systems', Paper presented at The 51st IEEE/IFIP International Conference on Dependable Systems and Networks, Taipei, Taiwan, Province of China, 21/06/21 - 24/06/21.

APA

Alharthi, K., Jhumka, A., Di, S., Cappello, F., & Chuah, E. (2021). Sentiment Analysis based Error Detection for Large-Scale Systems. Paper presented at The 51st IEEE/IFIP International Conference on Dependable Systems and Networks, Taipei, Taiwan, Province of China.

Vancouver

Alharthi K, Jhumka A, Di S, Cappello F, Chuah E. Sentiment Analysis based Error Detection for Large-Scale Systems. 2021. Paper presented at The 51st IEEE/IFIP International Conference on Dependable Systems and Networks, Taipei, Taiwan, Province of China.

Author

Alharthi, Khalid ; Jhumka, Arshad ; Di, Sheng et al. / Sentiment Analysis based Error Detection for Large-Scale Systems. Paper presented at The 51st IEEE/IFIP International Conference on Dependable Systems and Networks, Taipei, Taiwan, Province of China.

Bibtex

@conference{3e8afd0c5be2486d8db349c64467baed,

title = "Sentiment Analysis based Error Detection for Large-Scale Systems",

abstract = "Today{\textquoteright}s large-scale systems such as High Performance Computing (HPC) Systems are designed/utilized towards exascale computing, inevitably decreasing its reliability due to the increasing design complexity. HPC systems conduct extensive logging of their execution behaviour. In this paper, we leverage the inherent meaning behind the log messages and propose a novel sentiment analysis-based approach for the error detection in large-scale systems, by automatically mining the sentiments in the log messages. Our contributions are four-fold. (1) We develop a machine learning (ML) based approach to automatically build a sentiment lexicon, based on the system log message templates. (2) Using the sentiment lexicon, we develop an algorithm to detect system errors. (3) We develop an algorithm to identify the nodes and components with erroneous behaviors, based on sentiment polarity scores. (4) We evaluate our solution vs. other state-of-the-art machine/deep learning algorithms based on three representative supercomputers{\textquoteright} system logs. Experiments show that our error detection algorithm can identify error messages with an average MCC score and f-score of 91% and 96% respectively, while state of the art ML/deep learningmodel (LSTM) obtains only 67% and 84%. To the best of our knowledge, this is the first work leveraging the sentiments embedded in log entries of large-scale systems for system health analysis.",

author = "Khalid Alharthi and Arshad Jhumka and Sheng Di and Franck Cappello and Edward Chuah",

year = "2021",

month = jun,

day = "24",

language = "English",

note = "The 51st IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2021 ; Conference date: 21-06-2021 Through 24-06-2021",

url = "https://dsn2021.ntu.edu.tw/",

}

RIS

TY - CONF

T1 - Sentiment Analysis based Error Detection for Large-Scale Systems

AU - Alharthi, Khalid

AU - Jhumka, Arshad

AU - Di, Sheng

AU - Cappello, Franck

AU - Chuah, Edward

PY - 2021/6/24

Y1 - 2021/6/24

N2 - Today’s large-scale systems such as High Performance Computing (HPC) Systems are designed/utilized towards exascale computing, inevitably decreasing its reliability due to the increasing design complexity. HPC systems conduct extensive logging of their execution behaviour. In this paper, we leverage the inherent meaning behind the log messages and propose a novel sentiment analysis-based approach for the error detection in large-scale systems, by automatically mining the sentiments in the log messages. Our contributions are four-fold. (1) We develop a machine learning (ML) based approach to automatically build a sentiment lexicon, based on the system log message templates. (2) Using the sentiment lexicon, we develop an algorithm to detect system errors. (3) We develop an algorithm to identify the nodes and components with erroneous behaviors, based on sentiment polarity scores. (4) We evaluate our solution vs. other state-of-the-art machine/deep learning algorithms based on three representative supercomputers’ system logs. Experiments show that our error detection algorithm can identify error messages with an average MCC score and f-score of 91% and 96% respectively, while state of the art ML/deep learningmodel (LSTM) obtains only 67% and 84%. To the best of our knowledge, this is the first work leveraging the sentiments embedded in log entries of large-scale systems for system health analysis.

AB - Today’s large-scale systems such as High Performance Computing (HPC) Systems are designed/utilized towards exascale computing, inevitably decreasing its reliability due to the increasing design complexity. HPC systems conduct extensive logging of their execution behaviour. In this paper, we leverage the inherent meaning behind the log messages and propose a novel sentiment analysis-based approach for the error detection in large-scale systems, by automatically mining the sentiments in the log messages. Our contributions are four-fold. (1) We develop a machine learning (ML) based approach to automatically build a sentiment lexicon, based on the system log message templates. (2) Using the sentiment lexicon, we develop an algorithm to detect system errors. (3) We develop an algorithm to identify the nodes and components with erroneous behaviors, based on sentiment polarity scores. (4) We evaluate our solution vs. other state-of-the-art machine/deep learning algorithms based on three representative supercomputers’ system logs. Experiments show that our error detection algorithm can identify error messages with an average MCC score and f-score of 91% and 96% respectively, while state of the art ML/deep learningmodel (LSTM) obtains only 67% and 84%. To the best of our knowledge, this is the first work leveraging the sentiments embedded in log entries of large-scale systems for system health analysis.

M3 - Conference paper

T2 - The 51st IEEE/IFIP International Conference on Dependable Systems and Networks

Y2 - 21 June 2021 through 24 June 2021

ER -

Research

Associated organisational units

Electronic data