SLA-based adaptation schemes in distributed stream processing engines

Computing and Communications

Associated organisational unit

Digital Health Group

Text available via DOI:

https://doi.org/10.3390/app9061045
Final published version
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Keywords

Big data, Cloud computing, Distributed computing, Modern stream processing engine, SLA, Watermarking

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

SLA-based adaptation schemes in distributed stream processing engines. / Hanif, M.; Kim, E.; Helal, S. et al.
In: Applied Sciences, Vol. 9, No. 6, 1045, 13.03.2019.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Hanif, M, Kim, E, Helal, S & Lee, C 2019, 'SLA-based adaptation schemes in distributed stream processing engines', Applied Sciences, vol. 9, no. 6, 1045. https://doi.org/10.3390/app9061045

APA

Hanif, M., Kim, E., Helal, S., & Lee, C. (2019). SLA-based adaptation schemes in distributed stream processing engines. Applied Sciences, 9(6), Article 1045. https://doi.org/10.3390/app9061045

Vancouver

Hanif M, Kim E, Helal S, Lee C. SLA-based adaptation schemes in distributed stream processing engines. Applied Sciences. 2019 Mar 13;9(6):1045. doi: 10.3390/app9061045

Author

Hanif, M. ; Kim, E. ; Helal, S. et al. / SLA-based adaptation schemes in distributed stream processing engines. In: Applied Sciences. 2019 ; Vol. 9, No. 6.

Bibtex

@article{89173b92821b4fd8925e87688d7b3cc4,

title = "SLA-based adaptation schemes in distributed stream processing engines",

abstract = "With the upswing in the volume of data, information online, and magnanimous cloud applications, big data analytics becomes mainstream in the research communities in the industry as well as in the scholarly world. This prompted the emergence and development of real-time distributed stream processing frameworks, such as Flink, Storm, Spark, and Samza. These frameworks endorse complex queries on streaming data to be distributed across multiple worker nodes in a cluster. Few of these stream processing frameworks provides fundamental support for controlling the latency and throughput of the system as well as the correctness of the results. However, none has the ability to handle them on the fly at runtime. We present a well-informed and efficient adaptive watermarking and dynamic buffering timeout mechanism for the distributed streaming frameworks. It is designed to increase the overall throughput of the system by making the watermarks adaptive towards the stream of incoming workload, and scale the buffering timeout dynamically for each task tracker on the fly while maintaining the Service Level Agreement (SLA)-based end-to-end latency of the system. This work focuses on tuning the parameters of the system (such as window correctness, buffering timeout, and so on) based on the prediction of incoming workloads and assesses whether a given workload will breach an SLA using output metrics including latency, throughput, and correctness of both intermediate and final results. We used Apache Flink as our testbed distributed processing engine for this work. However, the proposed mechanism can be applied to other streaming frameworks as well. Our results on the testbed model indicate that the proposed system outperforms the status quo of stream processing. With the inclusion of learning models like na{\"i}ve Bayes, multilayer perceptron (MLP), and sequential minimal optimization (SMO)., the system shows more progress in terms of keeping the SLA intact as well as quality of service (QoS).",

keywords = "Big data, Cloud computing, Distributed computing, Modern stream processing engine, SLA, Watermarking",

author = "M. Hanif and E. Kim and S. Helal and C. Lee",

year = "2019",

month = mar,

day = "13",

doi = "10.3390/app9061045",

language = "English",

volume = "9",

journal = "Applied Sciences",

issn = "2076-3417",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "6",

}

RIS

TY - JOUR

T1 - SLA-based adaptation schemes in distributed stream processing engines

AU - Hanif, M.

AU - Kim, E.

AU - Helal, S.

AU - Lee, C.

PY - 2019/3/13

Y1 - 2019/3/13

N2 - With the upswing in the volume of data, information online, and magnanimous cloud applications, big data analytics becomes mainstream in the research communities in the industry as well as in the scholarly world. This prompted the emergence and development of real-time distributed stream processing frameworks, such as Flink, Storm, Spark, and Samza. These frameworks endorse complex queries on streaming data to be distributed across multiple worker nodes in a cluster. Few of these stream processing frameworks provides fundamental support for controlling the latency and throughput of the system as well as the correctness of the results. However, none has the ability to handle them on the fly at runtime. We present a well-informed and efficient adaptive watermarking and dynamic buffering timeout mechanism for the distributed streaming frameworks. It is designed to increase the overall throughput of the system by making the watermarks adaptive towards the stream of incoming workload, and scale the buffering timeout dynamically for each task tracker on the fly while maintaining the Service Level Agreement (SLA)-based end-to-end latency of the system. This work focuses on tuning the parameters of the system (such as window correctness, buffering timeout, and so on) based on the prediction of incoming workloads and assesses whether a given workload will breach an SLA using output metrics including latency, throughput, and correctness of both intermediate and final results. We used Apache Flink as our testbed distributed processing engine for this work. However, the proposed mechanism can be applied to other streaming frameworks as well. Our results on the testbed model indicate that the proposed system outperforms the status quo of stream processing. With the inclusion of learning models like naïve Bayes, multilayer perceptron (MLP), and sequential minimal optimization (SMO)., the system shows more progress in terms of keeping the SLA intact as well as quality of service (QoS).

AB - With the upswing in the volume of data, information online, and magnanimous cloud applications, big data analytics becomes mainstream in the research communities in the industry as well as in the scholarly world. This prompted the emergence and development of real-time distributed stream processing frameworks, such as Flink, Storm, Spark, and Samza. These frameworks endorse complex queries on streaming data to be distributed across multiple worker nodes in a cluster. Few of these stream processing frameworks provides fundamental support for controlling the latency and throughput of the system as well as the correctness of the results. However, none has the ability to handle them on the fly at runtime. We present a well-informed and efficient adaptive watermarking and dynamic buffering timeout mechanism for the distributed streaming frameworks. It is designed to increase the overall throughput of the system by making the watermarks adaptive towards the stream of incoming workload, and scale the buffering timeout dynamically for each task tracker on the fly while maintaining the Service Level Agreement (SLA)-based end-to-end latency of the system. This work focuses on tuning the parameters of the system (such as window correctness, buffering timeout, and so on) based on the prediction of incoming workloads and assesses whether a given workload will breach an SLA using output metrics including latency, throughput, and correctness of both intermediate and final results. We used Apache Flink as our testbed distributed processing engine for this work. However, the proposed mechanism can be applied to other streaming frameworks as well. Our results on the testbed model indicate that the proposed system outperforms the status quo of stream processing. With the inclusion of learning models like naïve Bayes, multilayer perceptron (MLP), and sequential minimal optimization (SMO)., the system shows more progress in terms of keeping the SLA intact as well as quality of service (QoS).

KW - Big data

KW - Cloud computing

KW - Distributed computing

KW - Modern stream processing engine

KW - SLA

KW - Watermarking

U2 - 10.3390/app9061045

DO - 10.3390/app9061045

M3 - Journal article

VL - 9

JO - Applied Sciences

JF - Applied Sciences

SN - 2076-3417

IS - 6

M1 - 1045

ER -

Research

Associated organisational unit

Links

Text available via DOI:

Keywords