Code and Data Synthesis for Genetic Improvement in Emergent Software Systems

Computing and Communications

Associated organisational unit

Centre of Excellence in Environmental Data Science

Electronic data

main
Rights statement: © ACM, 2022. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Evolutionary Learning and Optimization, 2, 2, (30/06/2022) https://doi.org/10.1145/3542823
Accepted author manuscript, 1.77 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1145/3542823
Final published version

Keywords

: genetic improvement, optimization, emergent systems, data synthesis, data sampling, fitness function, language

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Code and Data Synthesis for Genetic Improvement in Emergent Software Systems. / Rainford, Penelope ; Porter, Barry.
In: Transactions on Evolutionary Learning and Optimization, Vol. 2, No. 2, 7, 30.06.2022.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Rainford, P & Porter, B 2022, 'Code and Data Synthesis for Genetic Improvement in Emergent Software Systems', Transactions on Evolutionary Learning and Optimization, vol. 2, no. 2, 7. https://doi.org/10.1145/3542823

APA

Rainford, P., & Porter, B. (2022). Code and Data Synthesis for Genetic Improvement in Emergent Software Systems. Transactions on Evolutionary Learning and Optimization, 2(2), Article 7. https://doi.org/10.1145/3542823

Vancouver

Rainford P , Porter B. Code and Data Synthesis for Genetic Improvement in Emergent Software Systems. Transactions on Evolutionary Learning and Optimization. 2022 Jun 30;2(2):7. Epub 2022 Jun 11. doi: 10.1145/3542823

Author

Rainford, Penelope ; Porter, Barry. / Code and Data Synthesis for Genetic Improvement in Emergent Software Systems. In: Transactions on Evolutionary Learning and Optimization. 2022 ; Vol. 2, No. 2.

Bibtex

@article{03e51b39ee92440a90b9b2c1046ccfc5,

title = "Code and Data Synthesis for Genetic Improvement in Emergent Software Systems",

abstract = "Emergent software systems are assembled from a collection of small code blocks, where some of those blocks have alternative implementation variants; they optimise at run-time by learning which compositions of alternative blocks best suit each deployment environment encountered.In this paper we study the automated synthesis of new implementation variants for a running system using genetic improvement (GI). Typical GI approaches, however, rely on large amounts of data for accurate training and large code bases from which to source genetic material. In emergent systems we have neither asset, with sparsely sampled runtime data and small code volumes in each building block.We therefore examine two approaches to more effective GI under these constraints: the synthesis of data from sparse samples to construct statistically representative larger training corpora; and the synthesis of code to counter the relative lack of genetic material in our starting population members.Our results demonstrate that a mixture of synthesised and existing code is a viable optimisation strategy, and that phases of increased synthesis can make GI more robust to deleterious mutations. On synthesised data, we find that we can produce equivalent optimisation compared to GI methods using larger data sets, and that this optimisation can produce both useful specialists and generalists.",

keywords = ": genetic improvement, optimization, emergent systems, data synthesis, data sampling, fitness function, language",

author = "Penelope Rainford and Barry Porter",

note = "{\textcopyright} ACM, 2022. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Evolutionary Learning and Optimization, 2, 2, (30/06/2022) https://doi.org/10.1145/3542823",

year = "2022",

month = jun,

day = "30",

doi = "10.1145/3542823",

language = "English",

volume = "2",

journal = "Transactions on Evolutionary Learning and Optimization",

publisher = "ACM",

number = "2",

}

RIS

TY - JOUR

T1 - Code and Data Synthesis for Genetic Improvement in Emergent Software Systems

AU - Rainford, Penelope

AU - Porter, Barry

N1 - © ACM, 2022. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Evolutionary Learning and Optimization, 2, 2, (30/06/2022) https://doi.org/10.1145/3542823

PY - 2022/6/30

Y1 - 2022/6/30

N2 - Emergent software systems are assembled from a collection of small code blocks, where some of those blocks have alternative implementation variants; they optimise at run-time by learning which compositions of alternative blocks best suit each deployment environment encountered.In this paper we study the automated synthesis of new implementation variants for a running system using genetic improvement (GI). Typical GI approaches, however, rely on large amounts of data for accurate training and large code bases from which to source genetic material. In emergent systems we have neither asset, with sparsely sampled runtime data and small code volumes in each building block.We therefore examine two approaches to more effective GI under these constraints: the synthesis of data from sparse samples to construct statistically representative larger training corpora; and the synthesis of code to counter the relative lack of genetic material in our starting population members.Our results demonstrate that a mixture of synthesised and existing code is a viable optimisation strategy, and that phases of increased synthesis can make GI more robust to deleterious mutations. On synthesised data, we find that we can produce equivalent optimisation compared to GI methods using larger data sets, and that this optimisation can produce both useful specialists and generalists.

AB - Emergent software systems are assembled from a collection of small code blocks, where some of those blocks have alternative implementation variants; they optimise at run-time by learning which compositions of alternative blocks best suit each deployment environment encountered.In this paper we study the automated synthesis of new implementation variants for a running system using genetic improvement (GI). Typical GI approaches, however, rely on large amounts of data for accurate training and large code bases from which to source genetic material. In emergent systems we have neither asset, with sparsely sampled runtime data and small code volumes in each building block.We therefore examine two approaches to more effective GI under these constraints: the synthesis of data from sparse samples to construct statistically representative larger training corpora; and the synthesis of code to counter the relative lack of genetic material in our starting population members.Our results demonstrate that a mixture of synthesised and existing code is a viable optimisation strategy, and that phases of increased synthesis can make GI more robust to deleterious mutations. On synthesised data, we find that we can produce equivalent optimisation compared to GI methods using larger data sets, and that this optimisation can produce both useful specialists and generalists.

KW - : genetic improvement

KW - optimization

KW - emergent systems

KW - data synthesis

KW - data sampling

KW - fitness function

KW - language

U2 - 10.1145/3542823

DO - 10.1145/3542823

M3 - Journal article

VL - 2

JO - Transactions on Evolutionary Learning and Optimization

JF - Transactions on Evolutionary Learning and Optimization

IS - 2

M1 - 7

ER -

Research

Associated organisational unit

Electronic data

Links

Text available via DOI:

Keywords