Timezone: »
Reverse engineering of binary executables is a critical problem in the computer security domain. On the one hand, malicious parties may recover interpretable source codes from the software products to gain commercial advantages. On the other hand, binary decompilation can be leveraged for code vulnerability analysis and malware detection. However, efficient binary decompilation is challenging. Conventional decompilers have the following major limitations: (i) they are only applicable to specific source-target language pair, hence incurs undesired development cost for new language tasks; (ii) their output high-level code cannot effectively preserve the correct functionality of the input binary; (iii) their output program does not capture the semantics of the input and the reversed program is hard to interpret. To address the above problems, we propose Coda1, the first end-to-end neural-based framework for code decompilation. Coda decomposes the decompilation task into of two key phases: First, Coda employs an instruction type-aware encoder and a tree decoder for generating an abstract syntax tree (AST) with attention feeding during the code sketch generation stage. Second, Coda then updates the code sketch using an iterative error correction machine guided by an ensembled neural error predictor. By finding a good approximate candidate and then fixing it towards perfect, Coda achieves superior with performance compared to baseline approaches. We assess Coda’s performance with extensive experiments on various benchmarks. Evaluation results show that Coda achieves an average of 82% program recovery accuracy on unseen binary samples, where the state-of-the-art decompilers yield 0% accuracy. Furthermore, Coda outperforms the sequence-to-sequence model with attention by a margin of 70% program accuracy. Our work reveals the vulnerability of binary executables and imposes a new threat to the protection of Intellectual Property (IP) for software development.
Author Information
Cheng Fu (University of California, San Diego)
Huili Chen (UCSD)
Haolan Liu (UCSD)
Xinyun Chen (UC Berkeley)
Yuandong Tian (Facebook AI Research)
Farinaz Koushanfar (UCSD)
Jishen Zhao (UCSD)
More from the Same Authors
-
2022 : FL-Talk: Covert Communication in Federated Learning via Spectral Steganography »
Huili Chen · Farinaz Koushanfar -
2022 Spotlight: Self-Aware Personalized Federated Learning »
Huili Chen · Jie Ding · Eric W. Tramel · Shuang Wu · Anit Kumar Sahu · Salman Avestimehr · Tao Zhang -
2022 : Panel RL Implementation »
Xiaolin Ge · Alborz Geramifard · Kence Anderson · Craig Buhr · Robert Nishihara · Yuandong Tian -
2022 Poster: Self-Aware Personalized Federated Learning »
Huili Chen · Jie Ding · Eric W. Tramel · Shuang Wu · Anit Kumar Sahu · Salman Avestimehr · Tao Zhang -
2021 : ML-guided iterative refinement for system optimization »
Yuandong Tian -
2021 : Community Infrastructure for Applying Reinforcement Learning to Compiler Optimizations »
Chris Cummins · Bram Wasti · Brandon Cui · Olivier Teytaud · Benoit Steiner · Yuandong Tian · Hugh Leather -
2021 Poster: Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages »
Xinyun Chen · Dawn Song · Yuandong Tian -
2021 : Machine Learning for Combinatorial Optimization + Q&A »
Maxime Gasse · Simon Bowly · Chris Cameron · Quentin Cappart · Jonas Charfreitag · Laurent Charlin · Shipra Agrawal · Didier Chetelat · Justin Dumouchelle · Ambros Gleixner · Aleksandr Kazachkov · Elias Khalil · Pawel Lichocki · Andrea Lodi · Miles Lubin · Christopher Morris · Dimitri Papageorgiou · Augustin Parjadis · Sebastian Pokutta · Antoine Prouvost · Yuandong Tian · Lara Scavuzzo · Giulia Zarpellon -
2021 Poster: NovelD: A Simple yet Effective Exploration Criterion »
Tianjun Zhang · Huazhe Xu · Xiaolong Wang · Yi Wu · Kurt Keutzer · Joseph Gonzalez · Yuandong Tian -
2021 Poster: MADE: Exploration via Maximizing Deviation from Explored Regions »
Tianjun Zhang · Paria Rashidinejad · Jiantao Jiao · Yuandong Tian · Joseph Gonzalez · Stuart Russell -
2021 Poster: Learning Space Partitions for Path Planning »
Kevin Yang · Tianjun Zhang · Chris Cummins · Brandon Cui · Benoit Steiner · Linnan Wang · Joseph Gonzalez · Dan Klein · Yuandong Tian -
2020 : QA: Yuandong Tian »
Yuandong Tian -
2020 : Contributed Talk: Yuandong Tian »
Yuandong Tian -
2020 : Panel »
Augustus Odena · Charles Sutton · Roopsha Samanta · Xinyun Chen · Elena Glassman -
2020 : Xinyun Chen Talk »
Xinyun Chen -
2020 : Invited Talk (Yuandong Tian) »
Yuandong Tian -
2020 Poster: Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis »
Kavi Gupta · Peter Ebert Christensen · Xinyun Chen · Dawn Song -
2020 Poster: Compositional Generalization via Neural-Symbolic Stack Machines »
Xinyun Chen · Chen Liang · Adams Wei Yu · Dawn Song · Denny Zhou -
2020 Poster: Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search »
Linnan Wang · Rodrigo Fonseca · Yuandong Tian -
2020 Poster: Joint Policy Search for Multi-agent Collaboration with Imperfect Information »
Yuandong Tian · Qucheng Gong · Yu Jiang -
2019 Poster: Hierarchical Decision Making by Generating and Following Natural Language Instructions »
Hengyuan Hu · Denis Yarats · Qucheng Gong · Yuandong Tian · Mike Lewis -
2019 Poster: One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers »
Ari Morcos · Haonan Yu · Michela Paganini · Yuandong Tian -
2019 Poster: Learning to Perform Local Rewriting for Combinatorial Optimization »
Xinyun Chen · Yuandong Tian -
2018 Poster: Tree-to-tree Neural Networks for Program Translation »
Xinyun Chen · Chang Liu · Dawn Song -
2017 Poster: ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games »
Yuandong Tian · Qucheng Gong · Wendy Shang · Yuxin Wu · Larry Zitnick -
2017 Oral: ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games »
Yuandong Tian · Qucheng Gong · Wendy Shang · Yuxin Wu · Larry Zitnick