Home > Research > Publications & Outputs > Performance Optimization on big.LITTLE Architec...

Electronic data

  • lctes

    Rights statement: © ACM, 2020. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in LCTES '20: The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, (2020) https://dl.acm.org/doi/10.1145/3372799.3394370

    Accepted author manuscript, 0.98 MB, PDF document

    Available under license: Unspecified

Links

Text available via DOI:

View graph of relations

Performance Optimization on big.LITTLE Architectures: A Memory-latency Aware Approach

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
Publication date1/06/2020
Host publicationLCTES '20: The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems
Place of PublicationNew York
PublisherACM
Pages51–61
Number of pages11
ISBN (print)9781450370943
<mark>Original language</mark>English

Abstract

The energy demands of modern mobile devices have driven a trend towards heterogeneous multi-core systems which include various types of core tuned for performance or energy efficiency, offering a rich optimization space for software. On such systems, data coherency between cores is automatically ensured by an interconnect between processors. On some chip designs the performance of this interconnect, and by extension of the entire CPU cluster, is highly dependent on the software's memory access characteristics and on the set of frequencies of each CPU core. Existing frequency scaling mechanisms in operating systems use a simple load-based heuristic to tune CPU frequencies, and so fail to achieve a holistically good configuration across such diverse clusters. We propose a new adaptive governor to solve this problem, which uses a simple trained hardware model of cache interconnect characteristics, along with real-time hardware monitors, to continually adjust core frequencies to maximize system performance. We evaluate our governor on the Exynos5422 SoC, as used in the Samsung Galaxy S5, across a range of standard benchmarks. This shows that our approach achieves a speedup of up to 40%, and a 70% energy saving, including a 30% speedup in common mobile applications such as video decoding and web browsing.

Bibliographic note

© ACM, 2020. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in LCTES '20: The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, (2020) https://dl.acm.org/doi/10.1145/3372799.3394370