Improved Evaluation of Automatic Source Code Summarisation

Computing and Communications

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Improved Evaluation of Automatic Source Code Summarisation. / Phillips, Jesse ; Bowes, David ; El-Haj, Mahmoud et al.
2nd Wokshop on Natural Language Generation, Evaluation and Metrics: Proceedings of the Workshop. Stroudsberg, PA.: Association for Computational Linguistics (ACL Anthology), 2022. p. 326-335.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Phillips, J , Bowes, D , El-Haj, M & Hall, T 2022, Improved Evaluation of Automatic Source Code Summarisation. in 2nd Wokshop on Natural Language Generation, Evaluation and Metrics: Proceedings of the Workshop. Association for Computational Linguistics (ACL Anthology), Stroudsberg, PA., pp. 326-335, 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, Abu Dhabi, United Arab Emirates, 7/12/22. <https://aclanthology.org/2022.gem-1.28.pdf>

APA

Phillips, J., Bowes, D., El-Haj, M., & Hall, T. (2022). Improved Evaluation of Automatic Source Code Summarisation. In 2nd Wokshop on Natural Language Generation, Evaluation and Metrics: Proceedings of the Workshop (pp. 326-335). Association for Computational Linguistics (ACL Anthology). https://aclanthology.org/2022.gem-1.28.pdf

Vancouver

Phillips J , Bowes D , El-Haj M , Hall T. Improved Evaluation of Automatic Source Code Summarisation. In 2nd Wokshop on Natural Language Generation, Evaluation and Metrics: Proceedings of the Workshop. Stroudsberg, PA.: Association for Computational Linguistics (ACL Anthology). 2022. p. 326-335

Author

Phillips, Jesse ; Bowes, David ; El-Haj, Mahmoud et al. / Improved Evaluation of Automatic Source Code Summarisation. 2nd Wokshop on Natural Language Generation, Evaluation and Metrics: Proceedings of the Workshop. Stroudsberg, PA. : Association for Computational Linguistics (ACL Anthology), 2022. pp. 326-335

Bibtex

@inproceedings{d08d7bf8c3b54a6da320d3c3e435a05f,

title = "Improved Evaluation of Automatic Source Code Summarisation",

abstract = "Source code summaries are a vital tool for the understanding and maintenance of source code as they can be used to explain code in simple terms. However, source code with missing, incorrect, or outdated summaries is a common occurrence in production code. Automatic source code summarisation seeks to solve these issues by generating up-to-date summaries of source code methods. Recent work in automatically generating source code summaries uses neural networks for generating summaries; commonly Sequence-to-Sequence or Transformer models, pretrained on method-summary pairs. The most common method of evaluating the quality of these summaries is comparing the machine-generated summaries against human-written summaries. Summaries can be evaluated using n-gram-based translation metrics such as BLEU, METEOR, or ROUGE-L. However, these metrics alone can be unreliable and new Natural Language Generation metrics based on large pretrained language models provide an alternative. In this paper, we propose a method of improving the evaluation of a model by improving the preprocessing of the data used to train it, as well as proposing evaluating the model with a metric based off a language model, pretrained on a Natural Language (English) alongside traditional metrics. Our evaluation suggests our model has been improved by cleaning and preprocessing the data used in model training. The addition of a pretrained language model metric alongside traditional metrics shows that both produce results which can be used to evaluate neural source code summarisation.",

author = "Jesse Phillips and David Bowes and Mahmoud El-Haj and Tracy Hall",

year = "2022",

month = dec,

day = "7",

language = "English",

isbn = "9781959429128",

pages = "326--335",

booktitle = "2nd Wokshop on Natural Language Generation, Evaluation and Metrics",

publisher = "Association for Computational Linguistics (ACL Anthology)",

note = "2nd Workshop on Natural Language Generation, Evaluation, and Metrics, GEM ; Conference date: 07-12-2022 Through 09-12-2022",

}

RIS

TY - GEN

T1 - Improved Evaluation of Automatic Source Code Summarisation

AU - Phillips, Jesse

AU - Bowes, David

AU - El-Haj, Mahmoud

AU - Hall, Tracy

PY - 2022/12/7

Y1 - 2022/12/7

N2 - Source code summaries are a vital tool for the understanding and maintenance of source code as they can be used to explain code in simple terms. However, source code with missing, incorrect, or outdated summaries is a common occurrence in production code. Automatic source code summarisation seeks to solve these issues by generating up-to-date summaries of source code methods. Recent work in automatically generating source code summaries uses neural networks for generating summaries; commonly Sequence-to-Sequence or Transformer models, pretrained on method-summary pairs. The most common method of evaluating the quality of these summaries is comparing the machine-generated summaries against human-written summaries. Summaries can be evaluated using n-gram-based translation metrics such as BLEU, METEOR, or ROUGE-L. However, these metrics alone can be unreliable and new Natural Language Generation metrics based on large pretrained language models provide an alternative. In this paper, we propose a method of improving the evaluation of a model by improving the preprocessing of the data used to train it, as well as proposing evaluating the model with a metric based off a language model, pretrained on a Natural Language (English) alongside traditional metrics. Our evaluation suggests our model has been improved by cleaning and preprocessing the data used in model training. The addition of a pretrained language model metric alongside traditional metrics shows that both produce results which can be used to evaluate neural source code summarisation.

AB - Source code summaries are a vital tool for the understanding and maintenance of source code as they can be used to explain code in simple terms. However, source code with missing, incorrect, or outdated summaries is a common occurrence in production code. Automatic source code summarisation seeks to solve these issues by generating up-to-date summaries of source code methods. Recent work in automatically generating source code summaries uses neural networks for generating summaries; commonly Sequence-to-Sequence or Transformer models, pretrained on method-summary pairs. The most common method of evaluating the quality of these summaries is comparing the machine-generated summaries against human-written summaries. Summaries can be evaluated using n-gram-based translation metrics such as BLEU, METEOR, or ROUGE-L. However, these metrics alone can be unreliable and new Natural Language Generation metrics based on large pretrained language models provide an alternative. In this paper, we propose a method of improving the evaluation of a model by improving the preprocessing of the data used to train it, as well as proposing evaluating the model with a metric based off a language model, pretrained on a Natural Language (English) alongside traditional metrics. Our evaluation suggests our model has been improved by cleaning and preprocessing the data used in model training. The addition of a pretrained language model metric alongside traditional metrics shows that both produce results which can be used to evaluate neural source code summarisation.

M3 - Conference contribution/Paper

SN - 9781959429128

SP - 326

EP - 335

BT - 2nd Wokshop on Natural Language Generation, Evaluation and Metrics

PB - Association for Computational Linguistics (ACL Anthology)

CY - Stroudsberg, PA.

T2 - 2nd Workshop on Natural Language Generation, Evaluation, and Metrics

Y2 - 7 December 2022 through 9 December 2022

ER -

Research

Links