Workshop
Advances in Programming Languages and Neurosymbolic Systems (AIPLANS)
Breandan Considine · Disha Shrivastava · David YuTung Hui · ChinWei Huang · Shawn Tan · Xujie Si · Prakash Panangaden · Guy Van den Broeck · Daniel Tarlow
Neural information processing systems have benefited tremendously from the availability of programming languages and frameworks for automatic differentiation (AD). Not only do NeurIPS benefit from programming languages for automatic inference but can also be considered as a language in their own right, consisting of differentiable and stochastic primitives. Combined with neural language models, these systems are increasingly capable of generating symbolic programs a human programmer might write in a highlevel language. Developing neurosymbolic systems for automatic program synthesis requires insights from both statistical learning and programming languages.
AIPLANS invites all researchers working towards the same purpose in these two communities to build on common ground. Our workshop is designed to be as inclusive as possible towards researchers engaged in building programming languages and neurosymbolic systems.
Schedule
Tue 3:45 a.m.  4:00 a.m.

Introductory remarks
(
Introductory Remarks
)
SlidesLive Video 
Breandan Considine 🔗 
Tue 4:00 a.m.  4:45 a.m.

Thinking like Transformers  Gail Weiss  Technion  Israel Institute of Technology
(
Invited Talk
)
SlidesLive Video Transformers  the purely attention based NN architecture  have emerged as a powerful tool in sequence processing. But how does a transformer think? When we discuss the computational power of RNNs, or consider a problem that they have solved, it is easy for us to think in terms of automata and their variants (such as counter machines and pushdown automata). But when it comes to transformers, no such intuitive model is available. In this talk I will present a programming language, RASP (Restricted Access Sequence Processing), which we hope will serve the same purpose for transformers as finite state machines do for RNNs. In particular, we will identify the base computations of a transformer and abstract them into a small number of primitives, which are composed into a small programming language. We will go through some example programs in the language, and discuss how a given RASP program relates to the transformer architecture. 
Gail Weiss 🔗 
Tue 4:45 a.m.  4:55 a.m.

Q&A  Gail Weiss
(
Post Talk Q & A
)

🔗 
Tue 5:00 a.m.  6:00 a.m.

When Gödel discovered Automatic Differentiation  Marie Kerjean  Centre national de la recherche scientifique
(
Invited Talk
)
SlidesLive Video I will explore the boundaries between differentiable programming and logic, through the prism of the CurryHoward correspondence. I will recall the latter and explain how automatic differentiation fits into each of its three facets: functions, proofs and programs. In particular, I will explain how backpropagation is identified with Gödel's Dialectica translation, a transformation of logical formulas historically used to prove consistency theorems and widely used in proof theory since then. 
AIPLANS 2021 🔗 
Tue 6:00 a.m.  7:00 a.m.

Building machines that learn and think like people by learning to write programs: progress, open problems, and next steps  Josh Tenenbaum  Massachusetts Institute of Technology
(
Invited Talk
)
SlidesLive Video If we want to build machines that think and learn like humans do, and that can learn and think with people, our best bet is to build machines that can learn to write programs expressing their thoughts in humanunderstandable code. These machines should also be able to learn from the kinds of data that humans naturally consume and produce: one or a few examples of program execution, and natural language descriptions of program goals or highlevel structure. We are far from achieving this goal, but the last few years have seen intriguing first steps and opened up a new set of hard problems for future work. I will talk about some lessons learned: how we might best combine neural and symbolic approaches under the broad rubric of probabilistic inference in hierarchical generative models for code, and the synergies to be gained from looking at both execution examples and natural language as sources of data. I will also discuss promising nearterm challenge domains that capture foundational human capacities for learning concepts, systems of concepts (or domain theories) and causal models, and where the next generation of program learning approaches could make important progress. 
Josh Tenenbaum 🔗 
Tue 7:00 a.m.  7:15 a.m.

Short break
(
Break
)

🔗 
Tue 7:15 a.m.  8:15 a.m.

Panel Discussion
(
Panel Discussion
)
SlidesLive Video 
🔗 
Tue 8:15 a.m.  8:55 a.m.

Daniel Selsam Microsoft Research
(
Tutorial
)
SlidesLive Video 
Daniel Selsam 🔗 
Tue 8:55 a.m.  9:05 a.m.

Q&A  Daniel Selsam
(
Post Talk Q & A
)

🔗 
Tue 9:00 a.m.  10:15 a.m.

Lunch / Poster Session ( Poster Session ) link  🔗 
Tue 10:15 a.m.  10:20 a.m.

Remarks from Organisers
(
Introduction
)

🔗 
Tue 10:20 a.m.  10:46 a.m.

Randomized Automatic Differentiation  Ryan Adams  Princeton University
(
Invited Talk
)
SlidesLive Video Optimization is at the heart of machine learning, and gradient computation is central to many optimization techniques. Stochastic optimization, in particular, has taken center stage as the principal method of fitting many models, from deep neural networks to variational Bayesian posterior approximations. Generally, one uses data subsampling to efficiently construct unbiased gradient estimators for stochastic optimization, but this is only one possibility. In this talk, I will discuss an alternative approach to constructing unbiased gradient estimates in machine learning problems. We will revisit the Jacobian accumulation problem at the heart of automatic differentiation, observing that it is possible to collapse the linearized computational graph of, e.g., deep neural networks, in a randomized way such that less memory is used but little performance is lost. This is joint work with students Alex Beatson, Deniz Oktay, Joshua Aduol, and Nick McGreivy. 
Ryan Adams 🔗 
Tue 10:46 a.m.  10:56 a.m.

Q&A  Ryan Adams
(
Post Talk Q & A
)

🔗 
Tue 11:00 a.m.  11:45 a.m.

Dependent Types for Machine Learning in Dex  David Duvenaud  University of Toronto
(
Invited Talk
)
SlidesLive Video This talk will give a gentle introduction to Dex, an experimental programming language. Dex is designed to combine the clarity and safety of highlevel functional languages with the efficiency of lowlevel numerical languages. For example, Dex allows one to move much of the informal type and shape information normally contained in comments into compiletime checked types, while also omitting unambiguous details, to keep things terse. It also allows inplace updates and stateful, loopy code that can automatically take advantage of parallelism in a finegrained way. We'll demonstrate these features on standard deep architectures like attention and graph neural nets. 
David Duvenaud · AIPLANS 2021 🔗 
Tue 11:45 a.m.  11:55 a.m.

Q&A  David Duvenaud
(
Post Talk Q & A
)

🔗 
Tue 12:00 p.m.  12:45 p.m.

Differential Inference: A Criminally Underused Tool.  Alexander Rush  Cornell University
(
Invited Talk
)
SlidesLive Video Differential Inference is the use of differentiation to perform probabilistic inference. The technique itself is relatively straightforward and plays nicely with autodiff: it roughly just automates Bayes' rule the way autodiff automates the chain rule. However, there is still a tendency for students to get tied up in the knots of even elementary probabilistic inference. Inspired by polemics that shined light on autodifferentiation, this talk will be half a tutorial on the use of differential inference and half a demonstration of all the fun math that it can remove from your life. 
Alexander Rush 🔗 
Tue 12:45 p.m.  12:55 p.m.

Q&A  Alexander Rush
(
Post Talk Q & A
)

🔗 
Tue 1:00 p.m.  1:05 p.m.

Introduction to Spotlight Speakers
(
Organiser Remarks
)

🔗 
Tue 1:05 p.m.  1:15 p.m.

MetaLearning an Inference Algorithm for Probabilistic Programs  Gwonsoo Che
(
Spotlight Talks
)
SlidesLive Video 
AIPLANS 2021 · Gwonsoo Che 🔗 
Tue 1:15 p.m.  1:22 p.m.

LazyPPL: laziness and types in nonparametric probabilistic programs  Hugo Paquet
(
Spotlight Talk
)
SlidesLive Video 
AIPLANS 2021 · Hugo Paquet 🔗 
Tue 1:22 p.m.  1:32 p.m.

Learning Rules with Stratified Negation in Differentiable ILP  Giri Krishnan
(
Spotlight Talks
)
SlidesLive Video 
AIPLANS 2021 · Giri Krishnan 🔗 
Tue 1:32 p.m.  1:41 p.m.

Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization  Róbert Csordás
(
Spotlight Talk
)
SlidesLive Video 
AIPLANS 2021 · Róbert Csordás 🔗 
Tue 1:41 p.m.  1:51 p.m.

Type Inference as Optimization  Eirene V. Pandi
(
Spotlight Talk
)
SlidesLive Video 
AIPLANS 2021 · Eirini V. Pandi 🔗 
Tue 1:51 p.m.  2:00 p.m.

Q&A for Spotlight Authors
(
Q & A
)

🔗 
Tue 2:00 p.m.  2:15 p.m.

Closing Remarks
(
Closing remarks
)
SlidesLive Video 
Breandan Considine 🔗 
Tue 2:15 p.m.  3:00 p.m.

Poster Session ( Poster Session ) link  AIPLANS 2021 🔗 


Type Inference as Optimization
(
Poster
)
link
Optionally typed dynamic languages can permit multiple valid type assignments. When this happens, developers can prefer one valid type assignment over another because it better reflects how they think about the program and the problem it solves. Natural type inference (NTI) uses natural language text within source code, such as identifiers, to help choose valid programming language types. A growing body of techniques has been proposed for NTI. These techniques predict types; they seek to return natural type assignments (assignments that reflect developer preferences) while striving for correctness. They are empirically effective, but they are not sound by construction: they do not leverage programming language theory to formalize their algorithms and show correctness and termination. Filling this foundational gap is the purpose of this paper. We are the first to present a detailed algorithm for NTI that is validated with theorems and proofs. Valid type assignments obey logical constraints arising from type rules; natural type assignments obey natural constraints arising from the natural language text associated with a variable and its uses.The core intuition of this work is that logical and natural constraints can interact to speed finding a type valuation that 1. type checks (satisfies the logical constraints) and 2. is most natural.We formulate NTI as a joint optimization problem. To do this, we define a numerical relaxation over boolean logical constraints that give us a condition that we treat as a hard constraint, while simultaneously we minimize distance from natural constraints, which we treat as soft constraints for our optimization problem. Our main result, the first formal proof of soundness for natural type inference, is that our algorithm always terminates, either with an error or with a tuple that is guaranteed to be a type signature for its input. 
Eirini V. Pandi · Earl Barr · Andrew Gordon · Charles Sutton 🔗 


Are Transformers All That Karel Needs?
(
Poster
)
link
Recent works have shown the incredible promise of using neural networks for the task of program synthesis from inputoutput examples. In this paper, we propose using Transformerbased architectures as a baseline for the program synthesis task on the Karel dataset. Specifically, we leverage DistillGPT as our decoder model to perform program synthesis for the Karel DSL. We show that changing the model architecture from an LSTM to a transformer based architecture, we are able to significantly improve on supervised learning approaches obtaining a top1 generalization accuracy of 82.4%. Further, applying execution guided search on the output beams of the model increases the accuracy of our approach to 89.64%. 
Abhay Garg · Anand Sriraman · Shirish Karande 🔗 


Towards Neural Functional Program Evaluation
(
Poster
)
link
This paper explores the capabilities of current transformerbased language models for program evaluation of simple functional programming languages. We introduce a new program generation mechanism that allows control over syntactic sugar for semantically equivalent programs. T5 experiments reveal that neural functional program evaluation performs surprisingly well, achieving high 90% exact program match scores for most indistribution and outofdistribution tests. We present and evaluate on three datasets to study generalization abilities that are specific to functional programs based on: type, function composition, and reduction steps. 
Torsten Scholak · Jonathan Pilault 🔗 


Staged compilation of tensor expressions
(
Poster
)
link
We present our current progress towards a metaprogramming framework for tensor expressions embedded in Haskell; the system offers a highlevel syntax for dimensionannotated linear algebra, and generates specialized source code corresponding to the input expression. 
Marco Zocca 🔗 


Safe Neurosymbolic Learning with Differentiable Symbolic Execution
(
Poster
)
link
We study the problem of learning verifiably safe parameters for programs that use neural networks as well as symbolic, humanwritten code. Such neurosymbolic programs arise in many safetycritical domains. However, because they need not be differentiable, they cannot be learned using existing approaches to integrating learning and verification. Our method, Differentiable Symbolic Execution (DSE), learns such programs by sampling code paths using symbolic execution, constructing gradients of a worstcase ``safety loss'' along these paths, and then backpropagating these gradients through program operations using a generalization of the reinforce estimator. We evaluate the method on a mix of synthetic tasks and realworld control and navigation benchmarks. Our experiments show that DSE significantly outperforms the stateoftheart DiffAI method on these tasks. 
Chenxi Yang · Swarat Chaudhuri 🔗 


AutoCoder: Leveraging Transformers for Automatic Code Synthesis
(
Poster
)
link
Program synthesis from natural language descriptions is a challenging task. This paper explores two variants of transformer models for the task of program synthesis and showcase higher performance than the existing SOTA models. Through the end, we also discuss the differences in learned representation in these two variants. We demonstrate that the vanilla transformer model has a higher capacity to memorize the training data as compared to the other variant. 
Mrinal Anand · Mayank Singh 🔗 


Learning Rules with Stratified Negation in Differentiable ILP.
(
Poster
)
link
Differentiable methods to learn rules (logic programs) have the potential to integrate the interpretability, transferability and low data requirements of inductive logic programming with the noise tolerance of nonsymbolic learning. While negation is an essential component of reasoning, incorporating it into a logic programming framework poses several problems (hence its central place in the logic programming and nonmonotonic reasoning communities). Current implementations of differentiable rule learners either exclude negation entirely or else treat it only in passing. In this work, we introduce stratified negation into a differentiable inductive logic programming framework, and we demonstrate that the resulting system can learn recursive programs in which negation plays a central role. We include examples from multiple domains, e.g., arithmetic, graph, sets and lists. 
Giri Krishnan · Ramyaa Ramyaa 🔗 


AutumnSynth: Synthesis of Reactive Programs with Structured Latent State
(
Poster
)
link
The human ability to efficiently discover causal theories of their environments from observations is a feat of nature that remains elusive in machines. In this work, we attempt to make progress on this frontier by formulating the challenge of causal mechanism discovery from observed data as one of program synthesis. We focus on the domain of timevarying, Atarilike 2D grid worlds, and represent causal models in this domain using a programming language called Autumn. Discovering the causal structure underlying a sequence of observations is equivalent to identifying the program in the Autumn language that generates the observations. We introduce a novel program synthesis algorithm, called AutumnSynth, that approaches this synthesis challenge by integrating standard methods of synthesizing functions with an automata synthesis approach, used to discover the model's latent state. We evaluate our method on a suite of Autumn programs designed to express the richness of the domain, and our results signal the potential of our formulation. 
Ria Das · Zenna Tavares · Josh Tenenbaum · Armando SolarLezama 🔗 


PAC Synthesis of Machine Learning Programs
(
Poster
)
link
We study the problem of synthesizing programs that include machine learning components such as deep neural networks (DNNs). We focus on statistical properties, which are properties expected to hold with high probabilitye.g., that an image classification model correctly identifies people in images with high probability. We propose novel algorithms for sketching and synthesizing such programs by leveraging ideas from statistical learning theory to provide statistical soundness guarantees. We evaluate our approach on synthesizing list processing programs that include DNN components used to process image inputs, as well as case studies on image classification and on precision medicine. Our results demonstrate that our approach can be used to synthesize programs with probabilistic guarantees. 
Osbert Bastani 🔗 


Learning compositional programs with arguments and sampling
(
Poster
)
link
One of the most challenging goals in designing intelligent systems is empowering them with the ability to synthesize programs from data. Namely, given specific requirements in the form of input/output pairs, the goal is to train a machine learning model to discover a program that satisfies those requirements. A recent class of methods exploits combinatorial search procedures and deep learning to learn compositional programs. However, they usually generate only toy programs using a domainspecific language that does not provide any highlevel feature, such as function arguments, which reduces their applicability in realworld settings. We extend upon a state of the art model, AlphaNPI, by learning to generate functions that can accept arguments. This improvement will enable us to move closer to real computer programs. We showcase the potential of our approach by learning the Quicksort algorithm, showing how the ability to deal with arguments is crucial for learning and generalization. 
Giovanni De Toni · Andrea Passerini 🔗 


Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization
(
Poster
)
link
Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization. The situation is especially frustrating for algorithmic tasks, where they often fail to find intuitive solutions that can be simply expressed in terms of attention patterns. Here we propose two modifications to the Transformer architecture, copy gate and geometric attention, which facilitate learning such intuitive and interpretable solutions to algorithmic problems. Our novel Transformer, called Transformer Control Flow (TCF) achieves 100% length generalization accuracy on the classic compositional table lookup task. The resulting attention and gating patterns are interpretable, demonstrating that the model implements adaptive control flow. 
Róbert Csordás · Kazuki Irie · Jürgen Schmidhuber 🔗 


Adversarial Robustness of Program Synthesis Models
(
Poster
)
link
The resurgence of automatic program synthesis has been observed with the rise of deep learning. In this paper, we study the behaviour of the program synthesis model under adversarial settings. Our experiments suggest that these program synthesis models are prone to adversarial attacks. The proposed transformer model have higher adversarial performance than the current stateoftheart program synthesis model. We specifically experiment with \textsc{AlgoLisp} DSLbased generative models and showcase the existence of significant dataset bias through different classes of adversarial examples. 
Mrinal Anand · Mayank Singh 🔗 


Learning C to x86 Translation: An Experiment in Neural Compilation
(
Poster
)
link
Deep learning has had a significant impact on many fields. Recently, codetocode neural models have been used in code translation, code refinement and decompilation. However, the question of whether these models can automate compilation has yet to be investigated. In this work, we explore neural compilation, building and evaluating Transformer models that learn how to produce x86 assembler from C code.Although preliminary results are relatively weak, we make our data, models and code publicly available to encourage further research in this area. 
Jordi ArmengolEstapé · Michael O'Boyle 🔗 


Synthesizing Video Trajectory Queries
(
Poster
)
link
We propose a novel framework called Quivr for synthesizing queries to identify events of interest in video data. For instance, Quivr can be used to identify instances of human driving behaviors such as lane changes or left turns, which are important for designing planning algorithms for autonomous cars. Our queries operate over object trajectories predicted by a deep object tracking model. Then, a query consists of regular expression operators used to compose underlying predicates (e.g., whether a car is in a lane), and selects a subset of trajectories. A key challenge is that queries are difficult for end users to develop: queries must reason about complex spatial and temporal patterns in object trajectories in order to select trajectories of interest, and predicates often include realvalued parameters (e.g., whether two cars are within a certain distance) that can be tedious to manually tune. Thus, Quivr automatically synthesizes queries given examples of trajectories that the query should match. To make the synthesis procedure efficient, we use overapproximations to prune invalid branches of the query search space, including using a quantitative variant of our query semantics to efficiently prune the search space over parameter values. We also propose two optimizations for speeding up the execution of our queries. Finally, we leverage an active learning strategy to disambiguate between multiple consistent candidate queries by collecting additional labels from the user. We evaluate Quivr on a benchmark of 11 tasks, and demonstrate that it can synthesize accurate queries for each task given just a few examples, and that our pruning strategy and optimizations substantially reduce synthesis time. 
Stephen Mell · Favyen Bastani · Stephan Zdancewic · Osbert Bastani 🔗 


Augmenting Classic Algorithms with Neural Components for Strong Generalisation on Ambiguous and HighDimensional Data
(
Poster
)
link
We augment classic algorithms with learned components to adapt them to domains currently dominated by deep learning models. Two traditional sorting algorithms with learnable neural building blocks are applied to visual data with apriori unknown symbols and rules. The models are quickly and reliably trained endtoend in a supervised setting. Our models learn symbol representations and generalise better than generic neural network models to longer input sequences. 
Imanol Schlag · Jürgen Schmidhuber 🔗 


MetaLearning an Inference Algorithm for Probabilistic Programs
(
Poster
)
link
We present a metaalgorithm for learning a posteriorinference algorithm for restricted probabilistic programs. Our metaalgorithm takes a training set of probabilistic programs that describe models with observations, and attempts to learn an efficient method for inferring the posterior of a similar program.A key feature of our approach is the use of what we call a whitebox inference algorithm that analyses the given program sequentially using multiple neural networks to compute an approximate posterior.The parameters of these networks are learnt from a training set by our metaalgorithm.We empirically demonstrate that the learnt inference algorithm generalises well to programs that are new in terms of both parameters and model structures, and report cases where our approach achieves greater testtime efficiency than alternatives such as HMC. 
Gwonsoo Che · Hongseok Yang 🔗 


Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning
(
Poster
)
link
Deep learning and symbolic reasoning are complementary techniques for an intelligent system. However, principled combinations of these techniques are typically limited in scalability, rendering them illsuited for realworld applications. We propose Scallop, a system that builds upon probabilistic deductive databases, to bridge this gap. On synthetic tasks involving mathematical and logical reasoning, Scallop scales significantly better without sacrificing accuracy compared to DeepProbLog, a principled neural logic programming approach. Scallop also scales to a realworld Visual Question Answering (VQA) benchmark that requires multihop reasoning, achieving 84.22% accuracy and outperforming two VQAtailored models based on Neural Module Networks and transformers by 12.42% and 21.66% respectively. 
Jiani Huang · Ziyang Li · Binghong Chen · Karan Samel · Mayur Naik · Le Song · Xujie Si 🔗 


LazyPPL: laziness and types in nonparametric probabilistic programs
(
Poster
)
link
We introduce LazyPPL, a prototype probabilistic programming library for Haskell. The library emphasises the clarifying power of types, and the connection between nonparametric, stochastic processes and lazy (call by need) evaluation. We illustrate the power of the language with natural specifications of infinite structures including Poisson point processes, Gaussian processes, and Dirichlet Process clustering. 
Hugo Paquet · Sam Staton 🔗 


Proof Extraction for Logical Neural Networks
(
Poster
)
link
Automated Theorem Provers (ATPs) are widely used for the verification of logical statements. Explainability is one of the key advantages of ATPs: providing an expert readable proof path which shows the inference steps taken to conclude correctness. Conversely, NeuroSymbolic Networks (NSNs) that perform theorem proving, do not have this capability. We propose a prooftracing and filtering algorithm to provide explainable reasoning in the case of Logical Neural Networks (LNNs), a special type of NeuralTheorem Prover (NTP). 
Thabang Lebese · Ndivhuwo Makondo · Cristina Cornelio · Naweed A Khan 🔗 


A Genetic Programming Approach To ZeroShot Neural Architecture Ranking
(
Poster
)
link
Neural networks are becoming increasingly ubiquitous in a wide range of use cases. A primary hurdle in deploying neural networks in many scenarios is the tedious and difficult neural network architectural design process, which was reliant on expert knowledge and iterative design. Neural Architecture Search (NAS) reduces the human effort required for design, but still has considerable resource requirements and is extremely slow. To address the inefficiencies of conventional NAS, ZeroShot NAS is a new paradigm, which introduces zero shot neural architecture scoring metrics (NASMs) to identify good neural network designs without training them. While applying Zero Shot NASMs is cheap and requires no training resources, we identify that there is a lack of NASMs that generalize well across neural architecture design spaces. In this paper, we present a program representation for NASMs and automate its search with genetic programming. We discover effective NASMs for Image Classification as well as Automatic Speech Recognition. We believe that our work indicates a new direction for NASM design and can greatly benefit from recent advances in program synthesis. 
Yash Akhauri · Juan Munoz · Ravishankar Iyer · Nilesh Jain 🔗 