The Design Space of Emergent Scheduling for Distributed Execution Frameworks

Computing and Communications

Electronic data

conference_101719
Rights statement: ©2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Accepted author manuscript, 1.81 MB, PDF document

Text available via DOI:

https://doi.org/10.1109/SEAMS51251.2021.00032
Final published version

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

The Design Space of Emergent Scheduling for Distributed Execution Frameworks. / Dean, Paul; Porter, Barry.
Symposium on Software Engineering for Adaptive and Self-Managing Systems. IEEE, 2021. p. 186-195.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Dean, P & Porter, B 2021, The Design Space of Emergent Scheduling for Distributed Execution Frameworks. in Symposium on Software Engineering for Adaptive and Self-Managing Systems. IEEE, pp. 186-195, SEAMS International Workshop on Software Engineering for Adaptive and Self-Managing Systems, ICSE, Madrid, Spain, 18/05/21. https://doi.org/10.1109/SEAMS51251.2021.00032

APA

Dean, P., & Porter, B. (2021). The Design Space of Emergent Scheduling for Distributed Execution Frameworks. In Symposium on Software Engineering for Adaptive and Self-Managing Systems (pp. 186-195). IEEE. https://doi.org/10.1109/SEAMS51251.2021.00032

Vancouver

Dean P, Porter B. The Design Space of Emergent Scheduling for Distributed Execution Frameworks. In Symposium on Software Engineering for Adaptive and Self-Managing Systems. IEEE. 2021. p. 186-195 Epub 2021 May 24. doi: 10.1109/SEAMS51251.2021.00032

Author

Dean, Paul ; Porter, Barry. / The Design Space of Emergent Scheduling for Distributed Execution Frameworks. Symposium on Software Engineering for Adaptive and Self-Managing Systems. IEEE, 2021. pp. 186-195

Bibtex

@inproceedings{6b70c0cee16e4462b81bc8ca68a7f1dd,

title = "The Design Space of Emergent Scheduling for Distributed Execution Frameworks",

abstract = "Distributed Execution Frameworks (DEFs) such as Apache Spark have become ubiquitous as a solution for the execution of user-defined jobs to process terabytes of data across hundreds of nodes. One of the key costs of DEFs is scheduling of which parts of each job are placed on each host; better scheduling decisions provide lower overall execution time for each job, more efficient resource usage, and reduced energy consumption. Existing DEFs use a static approach to scheduling, either with a single generalised scheduler which aims to be a good fit for most workloads, or with a special-purpose scheduler which is tuned to optimise for a particular kind of workload. In both cases the scheduling implementation is fixed at design-time such that the DEF is unable to adjust to the actual characteristics of workloads that arrive at deployment time. In this paper we introduce an emergent scheduler for Distributed Execution Frameworks. This scheduler can be composed and re-composed at runtime from a set of different building blocks, allowing the system to dynamically provide the benefits of differing scheduling policies over time depending on the actual properties of incoming workloads - with improved performance and resource usage. In this paper we present the overall design of our emergent scheduler, we discuss the theoretical design space of different scheduling approaches, and we examine a specific research question to determine the correlation between workload properties and scheduling performance for different scheduler implementations. Our results are based on a real implementation of our emergent DEF running across multiple hosts in a real datacentre, and our implementation is made available as open-source software.",

author = "Paul Dean and Barry Porter",

note = "{\textcopyright}2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.; SEAMS International Workshop on Software Engineering for Adaptive and Self-Managing Systems, ICSE ; Conference date: 18-05-2021 Through 18-05-2022",

year = "2021",

month = jun,

day = "29",

doi = "10.1109/SEAMS51251.2021.00032",

language = "English",

isbn = "9781665402903",

pages = "186--195",

booktitle = "Symposium on Software Engineering for Adaptive and Self-Managing Systems",

publisher = "IEEE",

url = "https://ieeexplore.ieee.org/xpl/conhome/9461924/proceeding",

}

RIS

TY - GEN

T1 - The Design Space of Emergent Scheduling for Distributed Execution Frameworks

AU - Dean, Paul

AU - Porter, Barry

N1 - ©2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

PY - 2021/6/29

Y1 - 2021/6/29

N2 - Distributed Execution Frameworks (DEFs) such as Apache Spark have become ubiquitous as a solution for the execution of user-defined jobs to process terabytes of data across hundreds of nodes. One of the key costs of DEFs is scheduling of which parts of each job are placed on each host; better scheduling decisions provide lower overall execution time for each job, more efficient resource usage, and reduced energy consumption. Existing DEFs use a static approach to scheduling, either with a single generalised scheduler which aims to be a good fit for most workloads, or with a special-purpose scheduler which is tuned to optimise for a particular kind of workload. In both cases the scheduling implementation is fixed at design-time such that the DEF is unable to adjust to the actual characteristics of workloads that arrive at deployment time. In this paper we introduce an emergent scheduler for Distributed Execution Frameworks. This scheduler can be composed and re-composed at runtime from a set of different building blocks, allowing the system to dynamically provide the benefits of differing scheduling policies over time depending on the actual properties of incoming workloads - with improved performance and resource usage. In this paper we present the overall design of our emergent scheduler, we discuss the theoretical design space of different scheduling approaches, and we examine a specific research question to determine the correlation between workload properties and scheduling performance for different scheduler implementations. Our results are based on a real implementation of our emergent DEF running across multiple hosts in a real datacentre, and our implementation is made available as open-source software.

AB - Distributed Execution Frameworks (DEFs) such as Apache Spark have become ubiquitous as a solution for the execution of user-defined jobs to process terabytes of data across hundreds of nodes. One of the key costs of DEFs is scheduling of which parts of each job are placed on each host; better scheduling decisions provide lower overall execution time for each job, more efficient resource usage, and reduced energy consumption. Existing DEFs use a static approach to scheduling, either with a single generalised scheduler which aims to be a good fit for most workloads, or with a special-purpose scheduler which is tuned to optimise for a particular kind of workload. In both cases the scheduling implementation is fixed at design-time such that the DEF is unable to adjust to the actual characteristics of workloads that arrive at deployment time. In this paper we introduce an emergent scheduler for Distributed Execution Frameworks. This scheduler can be composed and re-composed at runtime from a set of different building blocks, allowing the system to dynamically provide the benefits of differing scheduling policies over time depending on the actual properties of incoming workloads - with improved performance and resource usage. In this paper we present the overall design of our emergent scheduler, we discuss the theoretical design space of different scheduling approaches, and we examine a specific research question to determine the correlation between workload properties and scheduling performance for different scheduler implementations. Our results are based on a real implementation of our emergent DEF running across multiple hosts in a real datacentre, and our implementation is made available as open-source software.

U2 - 10.1109/SEAMS51251.2021.00032

DO - 10.1109/SEAMS51251.2021.00032

M3 - Conference contribution/Paper

SN - 9781665402903

SP - 186

EP - 195

BT - Symposium on Software Engineering for Adaptive and Self-Managing Systems

PB - IEEE

T2 - SEAMS International Workshop on Software Engineering for Adaptive and Self-Managing Systems, ICSE

Y2 - 18 May 2021 through 18 May 2022

ER -

Research

Electronic data

Links

Text available via DOI: