Timezone: »

Integrating Tree Path in Transformer for Code Representation
Han Peng · Ge Li · Wenhan Wang · YunFei Zhao · Zhi Jin

Thu Dec 09 12:30 AM -- 02:00 AM (PST) @ Virtual

Learning distributed representation of source code requires modelling its syntax and semantics. Recent state-of-the-art models leverage highly structured source code representations, such as the syntax trees and paths therein. In this paper, we investigate two representative path encoding methods shown in previous research work and integrate them into the attention module of Transformer. We draw inspiration from the ideas of positional encoding and modify them to incorporate these path encoding. Specifically, we encode both the pairwise path between tokens of source code and the path from the leaf node to the tree root for each token in the syntax tree. We explore the interaction between these two kinds of paths by integrating them into the unified Transformer framework. The detailed empirical study for path encoding methods also leads to our novel state-of-the-art representation model TPTrans, which finally outperforms strong baselines. Extensive experiments and ablation studies on code summarization across four different languages demonstrate the effectiveness of our approaches. We release our code at \url{https://github.com/AwdHanPeng/TPTrans}.

Author Information

Han Peng (Peking University)
Ge Li (Peking University)
Wenhan Wang (Peking University)
YunFei Zhao (Key Lab of High Confidence Software Technology, MoE (Peking University), Beijing, China)
Zhi Jin (Key Lab of High Confidence Software Technologies (Peking University), Ministry o)

More from the Same Authors