Home > Research > Publications & Outputs > Hatch:

Electronic data

  • HATCH

    Rights statement: This is the author’s version of a work that was accepted for publication in Future Generation Computer Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Future Generation Computer Systems, 132, 80-92, 2022 DOI:10.1016/j.future.2022.02.008

    Accepted author manuscript, 701 KB, PDF document

    Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Links

Text available via DOI:

View graph of relations

Hatch: Self-Distributing Systems for Data Centers

Research output: Contribution to Journal/MagazineJournal articlepeer-review

E-pub ahead of print
<mark>Journal publication date</mark>18/02/2022
<mark>Journal</mark>Future Generation Computer Systems
Volume132
Number of pages13
Pages (from-to)80-92
Publication StatusE-pub ahead of print
Early online date18/02/22
<mark>Original language</mark>English

Abstract

Designing and maintaining distributed systems remains highly challenging: there is a high-dimensional design space of potential ways to distribute a system’s sub-components over a large-scale infrastructure; and the deployment environment for a system tends to change in unforeseen ways over time. For engineers, this is a complex prediction problem to gauge which distributed design may best suit a given environment. We present the concept of self-distributing systems, in which any local system built using our framework can learn, at runtime, the most appropriate distributed design given
its perceived operating conditions. Our concept abstracts distribution of a system’s sub-components to a list of simple actions in a reward matrix of distributed design alternatives to be used by reinforcement learning algorithms. By doing this, we enable software to experiment, in a live production
environment, with different ways in which to distribute its software modules by placing them in different hosts throughout the system’s infrastructure. We implement this concept in a framework we call Hatch, which has three major elements: (i) a transparent and generalized RPC layer that supports
seamless relocation of any local component to a remote host during execution; (ii) a set of primitives, including relocation, replication and sharding, from which to create an action/reward matrix of possible distributed designs of a system; and (iii) a decentralized reinforcement learning approach to converge towards more optimal designs in real time. Using an example of a self-distributing webserving infrastructure, Hatch is able to autonomously select the most suitable distributed design from among ù700,000 alternatives in about 5 minutes.

Bibliographic note

This is the author’s version of a work that was accepted for publication in Future Generation Computer Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Future Generation Computer Systems, 132, 80-92, 2022 DOI: 10.1016/j.future.2022.02.008