Timezone: »
Machine learning for understanding and editing source code has recently attracted significant interest, with many developments in new models, new code representations, and new tasks.This proliferation can appear disparate and disconnected, making each approach seemingly unique and incompatible, thus obscuring the core machine learning challenges and contributions.In this work, we demonstrate that the landscape can be significantly simplified by taking a general approach of mapping a graph to a sequence of tokens and pointers.Our main result is to show that 16 recently published tasks of different shapes can be cast in this form, based on which a single model architecture achieves near or above state-of-the-art results on nearly all tasks, outperforming custom models like code2seq and alternative generic models like Transformers.This unification further enables multi-task learning and a series of cross-cutting experiments about the importance of different modeling choices for code understanding and repair tasks.The full framework, called PLUR, is easily extensible to more tasks, and will be open-sourced (https://github.com/google-research/plur).
Author Information
Zimin Chen (KTH Royal Institute of Technology, Stockholm, Sweden)
Vincent J Hellendoorn (CMU)
I create intelligent tools for software engineers using machine learning. The potential of this intersection is tremendous: artificially intelligent models can (re)learn many software development processes and provide valuable support in coding, debugging, optimization, ensuring security, and more. But programming is also a very human activity, so supporting developers effectively is non-trivial: many of the most interesting tasks require rich insights into how developers write and reason about software, and my research has shown how popular models without those insights are often mismatched to practice. Instead, I simultaneously study machine learning and software engineering research. My work makes both fundamental advances in deep learning models for source code, leverages empirical methods to enable ground-breaking new tasks, and reflects on current models with real developer data to ensure that we are moving in the right direction.
Pascal Lamblin (Google Research – Brain Team)
Petros Maniatis (Google Brain)
Pierre-Antoine Manzagol (Google)
Daniel Tarlow (Google Research, Brain team)
Subhodeep Moitra (Google, Inc.)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair »
Dates n/a. Room
More from the Same Authors
-
2021 Spotlight: Learning Generalized Gumbel-max Causal Mechanisms »
Guy Lorberbom · Daniel D. Johnson · Chris Maddison · Daniel Tarlow · Tamir Hazan -
2021 Workshop: Advances in Programming Languages and Neurosymbolic Systems (AIPLANS) »
Breandan Considine · Disha Shrivastava · David Yu-Tung Hui · Chin-Wei Huang · Shawn Tan · Xujie Si · Prakash Panangaden · Guy Van den Broeck · Daniel Tarlow -
2021 Poster: Structured Denoising Diffusion Models in Discrete State-Spaces »
Jacob Austin · Daniel D. Johnson · Jonathan Ho · Daniel Tarlow · Rianne van den Berg -
2021 Poster: Learning to Combine Per-Example Solutions for Neural Program Synthesis »
Disha Shrivastava · Hugo Larochelle · Daniel Tarlow -
2021 Poster: Learning Generalized Gumbel-max Causal Mechanisms »
Guy Lorberbom · Daniel D. Johnson · Chris Maddison · Daniel Tarlow · Tamir Hazan -
2020 : Spotlight Session 1 »
Augustus Odena · Maxwell Nye · Disha Shrivastava · Mayank Agarwal · Vincent J Hellendoorn · Charles Sutton -
2019 Workshop: Program Transformations for ML »
Pascal Lamblin · Atilim Gunes Baydin · Alexander Wiltschko · Bart van Merriënboer · Emily Fertig · Barak Pearlmutter · David Duvenaud · Laurent Hascoet -
2019 Poster: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux -
2019 Spotlight: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux