Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development

Computing and Communications

Electronic data

Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development
Final published version, 478 KB, PDF document

Research output: Contribution to conference - Without ISBN/ISSN › Conference paper › peer-review

Published

Standard

Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development. / Phillips, Jesse ; El-Haj, Mo ; Hall, Tracy.
2024. 17-31 Paper presented at The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, Lancaster, United Kingdom.

Research output: Contribution to conference - Without ISBN/ISSN › Conference paper › peer-review

Harvard

Phillips, J , El-Haj, M & Hall, T 2024, 'Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development', Paper presented at The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, Lancaster, United Kingdom, 29/07/24 - 30/07/24 pp. 17-31.

APA

Phillips, J., El-Haj, M., & Hall, T. (2024). Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development. 17-31. Paper presented at The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, Lancaster, United Kingdom.

Vancouver

Phillips J , El-Haj M , Hall T. Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development. 2024. Paper presented at The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, Lancaster, United Kingdom.

Author

Phillips, Jesse ; El-Haj, Mo ; Hall, Tracy. / Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development. Paper presented at The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, Lancaster, United Kingdom.15 p.

Bibtex

@conference{268af1dc39184bf5bcc8e0731c266140,

title = "Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development",

abstract = "Source code summaries give developers and maintainers vital information about source code methods. These summaries aid with the security of software systems as they can be used to improve developer and maintainer understanding of code, with the aim of reducing the number of bugs and vulnerabilities. However writing these summaries takes up the developers{\textquoteright} time and these summaries are often missing, incomplete, or outdated. Neural source code summarisation solves these issues by summarising source code automatically. Current solutions use Transformer neural networks to achieve this. We present CodeSumBART - a BART-base model for neural source code summarisation, pretrained on a dataset of Java source code methods and English method summaries. We present a new approach to training Transformers for neural source code summarisation. We found that in our approach, using larger n-gram precision BLEU metrics for epoch validation, such as BLEU-4, produces better performing models than other common NLG metrics.",

author = "Jesse Phillips and Mo El-Haj and Tracy Hall",

year = "2024",

month = jul,

day = "30",

language = "English",

pages = "17--31",

note = "The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, NLPAICS ; Conference date: 29-07-2024 Through 30-07-2024",

url = "https://nlpaics.com/",

}

RIS

TY - CONF

T1 - Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development

AU - Phillips, Jesse

AU - El-Haj, Mo

AU - Hall, Tracy

PY - 2024/7/30

Y1 - 2024/7/30

N2 - Source code summaries give developers and maintainers vital information about source code methods. These summaries aid with the security of software systems as they can be used to improve developer and maintainer understanding of code, with the aim of reducing the number of bugs and vulnerabilities. However writing these summaries takes up the developers’ time and these summaries are often missing, incomplete, or outdated. Neural source code summarisation solves these issues by summarising source code automatically. Current solutions use Transformer neural networks to achieve this. We present CodeSumBART - a BART-base model for neural source code summarisation, pretrained on a dataset of Java source code methods and English method summaries. We present a new approach to training Transformers for neural source code summarisation. We found that in our approach, using larger n-gram precision BLEU metrics for epoch validation, such as BLEU-4, produces better performing models than other common NLG metrics.

AB - Source code summaries give developers and maintainers vital information about source code methods. These summaries aid with the security of software systems as they can be used to improve developer and maintainer understanding of code, with the aim of reducing the number of bugs and vulnerabilities. However writing these summaries takes up the developers’ time and these summaries are often missing, incomplete, or outdated. Neural source code summarisation solves these issues by summarising source code automatically. Current solutions use Transformer neural networks to achieve this. We present CodeSumBART - a BART-base model for neural source code summarisation, pretrained on a dataset of Java source code methods and English method summaries. We present a new approach to training Transformers for neural source code summarisation. We found that in our approach, using larger n-gram precision BLEU metrics for epoch validation, such as BLEU-4, produces better performing models than other common NLG metrics.

M3 - Conference paper

SP - 17

EP - 31

T2 - The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security

Y2 - 29 July 2024 through 30 July 2024

ER -

Research

Electronic data