Poster

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Ali Tehrani · Arijit Bhattacharjee · Le Chen · Nesreen K. Ahmed · Amir Yazdanbakhsh · Ali Jannesari

2024 Poster

Project Page [ Paper] [ Poster] [ OpenReview]

Abstract

Automatic translation of programming languages has garnered renewed interest, driven by recent advancements in large language models (LLMs). Encoder-decoder transformer models, in particular, have shown promise in translating between different programming languages. However, translating between a language and its high-performance computing (HPC) extension remains underexplored due to inherent challenges like complex parallel semantics understanding. In this paper, we introduce CodeRosetta, an encoder-decoder transformer model explicitly designed for translating between programming languages and also their HPC extensions. CodeRosetta is evaluated on C++ to CUDA and Fortran to C++ translation.It employs a customized learning-based framework with tailored pretraining and training objectives that enable it to effectively capture code semantics and parallel structural nuances, allowing for bidirectional code translation. Our results show that CodeRosetta outperforms state-of-the-art baselines in C++ to CUDA translation by 2.9 BLEU and 1.72 CodeBLUE points while improving compilation accuracy by 6.05%. Compared to general closed-source LLMs, our proposed bidirectional learning-based method improves C++ to CUDA translation by 22.08 BLEU and 14.39 CodeBLUE with 2.75% higher compilation accuracy.Finally, CodeRosetta exhibits proficiency in Fortran to parallel C++ translation, marking it, to our knowledge, as the first encoder-decoder model for such a complex translation task, improving CodeBLEU at least by 4.63 points compared to closed-source LLMs and Open Code LLM.

Video

Chat is not available.