Home > Research > Publications & Outputs > Performance-aware Speculative Resource Oversubs...

Electronic data

  • tpds2020-rose

    Rights statement: ©2020 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

    Accepted author manuscript, 3.51 MB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

Performance-aware Speculative Resource Oversubscription for Large-scale Clusters

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published
  • Renyu Yang
  • Xiaoyang Sun
  • Chunming Hu
  • Peter Garraghan
  • Tianyu Wo
  • Zhenyu Wen
  • Hao Peng
  • Jie Xu
  • Chao Li
Close
<mark>Journal publication date</mark>28/01/2020
<mark>Journal</mark>IEEE Transactions on Parallel and Distributed Systems
Issue number7
Volume31
Number of pages19
Pages (from-to)1499-1517
Publication StatusPublished
<mark>Original language</mark>English

Abstract

It is a long-standing challenge to achieve a high degree of resource utilization in cluster scheduling. Resource oversubscription has become a common practice in improving resource utilization and cost reduction. However, current centralized
approaches to oversubscription suffer from the issue with resource mismatch and fail to take into account other performance requirements, e.g., tail latency. In this paper we present ROSE, a new resource management platform capable of conducting performance-aware resource oversubscription. ROSE allows latency-sensitive long-running applications (LRAs) to co-exist with computation-intensive batch jobs. Instead of waiting for resource allocation to be confirmed by the centralized scheduler, job managers in ROSE can independently request to launch speculative tasks within specific machines according to their suitability for oversubscription. Node agents of those machines can however avoid any excessive resource oversubscription by means of a mechanism for admission control using multi-resource threshold control and performance-aware resource throttle. Experiments show that in case of mixed co-location of batch jobs and latency-sensitive LRAs, the CPU utilization and the disk utilization can reach
56.34% and 43.49%, respectively, but the 95th percentile of read latency in YCSB workloads only increases by 5.4% against the case of executing the LRAs alone.

Bibliographic note

©2020 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.