Treffer: Transformer-based networks over tree structures for code classification.

Title:
Transformer-based networks over tree structures for code classification.
Source:
Applied Intelligence; Jun2022, Vol. 52 Issue 8, p8895-8909, 15p
Database:
Complementary Index

Weitere Informationen

In software engineering (SE), code classification and related tasks, such as code clone detection are still challenging problems. Due to the elusive syntax and complicated semantics in software programs, existing traditional SE approaches still have difficulty differentiating between the functionalities of code snippets at the semantic level with high accuracy. As artificial intelligence (AI) techniques have increased in recent years, exploring different machine/deep learning techniques for code classification algorithms has become important. However, most existing machine/deep learning-based approaches often consider using convolutional neural networks (CNNs) or recurrent neural networks (RNNs) to process code texts. However, the two networks inevitably suffer from gradient vanishing problems and fail to capture long-distance dependencies from code statements, resulting in poor performance in downstream tasks. In this paper, we propose the TBCC (Transformer-Based Code Classifier), a novel transformer-based neural network for programming language processing, which can avoid these two problems. Moreover, to capture the important syntactical features from programming languages, we split the deep abstract syntax trees (ASTs) into smaller subtrees that, aim to exploit syntactical information in code statements. We have applied the TBCC to two different common program comprehension tasks to verify its effectiveness: a code classification task for C programs and a code clone detection task for Java programs. The experimental results show that the TBCC achieves state-of-the-art performance, outperforming the baseline methods in terms of accuracy, recall, and F1 score. For subsequent research, the code of TBCC has been released <sup>∗</sup>. [ABSTRACT FROM AUTHOR]

Copyright of Applied Intelligence is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)