Home > Research > Publications & Outputs > Patch-CLIP

Electronic data

  • pdf

    Accepted author manuscript, 412 KB, PDF document

Links

View graph of relations

Patch-CLIP: A Patch-Text Pre-Trained Model

Research output: Working paperPreprint

Published

Standard

Patch-CLIP: A Patch-Text Pre-Trained Model. / Tang, Xunzhu; Chen, Zhenghan; Ezzini, Saad et al.
2023.

Research output: Working paperPreprint

Harvard

APA

Tang, X., Chen, Z., Ezzini, S., Tian, H., Klein, J., & Bissyande, T. F. (2023). Patch-CLIP: A Patch-Text Pre-Trained Model. http://adsabs.harvard.edu/abs/2023arXiv231012753T

Vancouver

Tang X, Chen Z, Ezzini S, Tian H, Klein J, Bissyande TF. Patch-CLIP: A Patch-Text Pre-Trained Model. 2023 Oct 19.

Author

Tang, Xunzhu ; Chen, Zhenghan ; Ezzini, Saad et al. / Patch-CLIP : A Patch-Text Pre-Trained Model. 2023.

Bibtex

@techreport{d1a0f93cae0d428e8f91a7a3a1180b16,
title = "Patch-CLIP: A Patch-Text Pre-Trained Model",
abstract = "In recent years, patch representation learning has emerged as a necessary research direction for exploiting the capabilities of machine learning in software generation. These representations have driven significant performance enhancements across a variety of tasks involving code changes. While the progress is undeniable, a common limitation among existing models is their specialization: they predominantly excel in either predictive tasks, such as security patch classification, or in generative tasks such as patch description generation. This dichotomy is further exacerbated by a prevalent dependency on potentially noisy data sources. Specifically, many models utilize patches integrated with Abstract Syntax Trees (AST) that, unfortunately, may contain parsing inaccuracies, thus acting as a suboptimal source of supervision. In response to these challenges, we introduce PATCH-CLIP, a novel pre-training framework for patches and natural language text. PATCH-CLIP deploys a triple-loss training strategy for 1) patch-description contrastive learning, which enables to separate patches and descriptions in the embedding space, 2) patch-description matching, which ensures that each patch is associated to its description in the embedding space, and 3) patch-description generation, which ensures that the patch embedding is effective for generation. These losses are implemented for joint learning to achieve good performance in both predictive and generative tasks involving patches. Empirical evaluations focusing on patch description generation, demonstrate that PATCH-CLIP sets new state of the art performance, consistently outperforming the state-of-the-art in metrics like BLEU, ROUGE-L, METEOR, and Recall.",
keywords = "Computer Science - Software Engineering",
author = "Xunzhu Tang and Zhenghan Chen and Saad Ezzini and Haoye Tian and Jacques Klein and Bissyande, {Tegawende F.}",
year = "2023",
month = oct,
day = "19",
language = "English",
type = "WorkingPaper",

}

RIS

TY - UNPB

T1 - Patch-CLIP

T2 - A Patch-Text Pre-Trained Model

AU - Tang, Xunzhu

AU - Chen, Zhenghan

AU - Ezzini, Saad

AU - Tian, Haoye

AU - Klein, Jacques

AU - Bissyande, Tegawende F.

PY - 2023/10/19

Y1 - 2023/10/19

N2 - In recent years, patch representation learning has emerged as a necessary research direction for exploiting the capabilities of machine learning in software generation. These representations have driven significant performance enhancements across a variety of tasks involving code changes. While the progress is undeniable, a common limitation among existing models is their specialization: they predominantly excel in either predictive tasks, such as security patch classification, or in generative tasks such as patch description generation. This dichotomy is further exacerbated by a prevalent dependency on potentially noisy data sources. Specifically, many models utilize patches integrated with Abstract Syntax Trees (AST) that, unfortunately, may contain parsing inaccuracies, thus acting as a suboptimal source of supervision. In response to these challenges, we introduce PATCH-CLIP, a novel pre-training framework for patches and natural language text. PATCH-CLIP deploys a triple-loss training strategy for 1) patch-description contrastive learning, which enables to separate patches and descriptions in the embedding space, 2) patch-description matching, which ensures that each patch is associated to its description in the embedding space, and 3) patch-description generation, which ensures that the patch embedding is effective for generation. These losses are implemented for joint learning to achieve good performance in both predictive and generative tasks involving patches. Empirical evaluations focusing on patch description generation, demonstrate that PATCH-CLIP sets new state of the art performance, consistently outperforming the state-of-the-art in metrics like BLEU, ROUGE-L, METEOR, and Recall.

AB - In recent years, patch representation learning has emerged as a necessary research direction for exploiting the capabilities of machine learning in software generation. These representations have driven significant performance enhancements across a variety of tasks involving code changes. While the progress is undeniable, a common limitation among existing models is their specialization: they predominantly excel in either predictive tasks, such as security patch classification, or in generative tasks such as patch description generation. This dichotomy is further exacerbated by a prevalent dependency on potentially noisy data sources. Specifically, many models utilize patches integrated with Abstract Syntax Trees (AST) that, unfortunately, may contain parsing inaccuracies, thus acting as a suboptimal source of supervision. In response to these challenges, we introduce PATCH-CLIP, a novel pre-training framework for patches and natural language text. PATCH-CLIP deploys a triple-loss training strategy for 1) patch-description contrastive learning, which enables to separate patches and descriptions in the embedding space, 2) patch-description matching, which ensures that each patch is associated to its description in the embedding space, and 3) patch-description generation, which ensures that the patch embedding is effective for generation. These losses are implemented for joint learning to achieve good performance in both predictive and generative tasks involving patches. Empirical evaluations focusing on patch description generation, demonstrate that PATCH-CLIP sets new state of the art performance, consistently outperforming the state-of-the-art in metrics like BLEU, ROUGE-L, METEOR, and Recall.

KW - Computer Science - Software Engineering

M3 - Preprint

BT - Patch-CLIP

ER -