Timezone: »

Advances in Programming Languages and Neurosymbolic Systems (AIPLANS)
Breandan Considine · Disha Shrivastava · David Yu-Tung Hui · Chin-Wei Huang · Shawn Tan · Xujie Si · Prakash Panangaden · Guy Van den Broeck · Daniel Tarlow

Tue Dec 14 03:45 AM -- 03:00 PM (PST) @ None
Event URL: https://aiplans.github.io/ »

Neural information processing systems have benefited tremendously from the availability of programming languages and frameworks for automatic differentiation (AD). Not only do NeurIPS benefit from programming languages for automatic inference but can also be considered as a language in their own right, consisting of differentiable and stochastic primitives. Combined with neural language models, these systems are increasingly capable of generating symbolic programs a human programmer might write in a high-level language. Developing neurosymbolic systems for automatic program synthesis requires insights from both statistical learning and programming languages.

AIPLANS invites all researchers working towards the same purpose in these two communities to build on common ground. Our workshop is designed to be as inclusive as possible towards researchers engaged in building programming languages and neurosymbolic systems.

Tue 3:45 a.m. - 4:00 a.m.
Introductory remarks (Introductory Remarks)
Tue 4:00 a.m. - 5:00 a.m.

Transformers - the purely attention based NN architecture - have emerged as a powerful tool in sequence processing. But how does a transformer think? When we discuss the computational power of RNNs, or consider a problem that they have solved, it is easy for us to think in terms of automata and their variants (such as counter machines and pushdown automata). But when it comes to transformers, no such intuitive model is available.

In this talk I will present a programming language, RASP (Restricted Access Sequence Processing), which we hope will serve the same purpose for transformers as finite state machines do for RNNs. In particular, we will identify the base computations of a transformer and abstract them into a small number of primitives, which are composed into a small programming language. We will go through some example programs in the language, and discuss how a given RASP program relates to the transformer architecture.

Gail Weiss
Tue 5:00 a.m. - 6:00 a.m.

I will explore the boundaries between differentiable programming and logic, through the prism of the Curry-Howard correspondence. I will recall the latter and explain how automatic differentiation fits into each of its three facets: functions, proofs and programs. In particular, I will explain how backpropagation is identified with Gödel's Dialectica translation, a transformation of logical formulas historically used to prove consistency theorems and widely used in proof theory since then.

Tue 6:00 a.m. - 7:00 a.m.
Joshua Tenenbaum Massachusetts Institute of Technology (Invited Talk)
Josh Tenenbaum
Tue 7:00 a.m. - 8:00 a.m.
Panel Discussion
Tue 8:00 a.m. - 9:00 a.m.
Daniel Selsam Microsoft Research (Tutorial)
Daniel Selsam
Tue 9:00 a.m. - 10:30 a.m.
Lunch / Poster Session (Poster Session) [ Visit Poster at Spot C6 in Virtual World ]  link »
Tue 10:30 a.m. - 11:00 a.m.

Optimization is at the heart of machine learning, and gradient computation is central to many optimization techniques. Stochastic optimization, in particular, has taken center stage as the principal method of fitting many models, from deep neural networks to variational Bayesian posterior approximations. Generally, one uses data subsampling to efficiently construct unbiased gradient estimators for stochastic optimization, but this is only one possibility. In this talk, I will discuss an alternative approach to constructing unbiased gradient estimates in machine learning problems. We will revisit the Jacobian accumulation problem at the heart of automatic differentiation, observing that it is possible to collapse the linearized computational graph of, e.g., deep neural networks, in a randomized way such that less memory is used but little performance is lost. This is joint work with students Alex Beatson, Deniz Oktay, Joshua Aduol, and Nick McGreivy.

Ryan Adams
Tue 11:00 a.m. - 12:00 p.m.
David Duvenaud University of Toronto (Invited Talk)
David Duvenaud · AIPLANS 2021
Tue 12:00 p.m. - 1:00 p.m.

Differential Inference is the use of differentiation to perform probabilistic inference. The technique itself is relatively straightforward and plays nicely with autodiff: it roughly just automates Bayes' rule the way autodiff automates the chain rule. However, there is still a tendency for students to get tied up in the knots of even elementary probabilistic inference. Inspired by polemics that shined light on autodifferentiation, this talk will be half a tutorial on the use of differential inference and half a demonstration of all the fun math that it can remove from your life.

Alexander Rush
Tue 1:05 p.m. - 1:15 p.m.
Spotlight Talks from Authors 1 (Spotlight Talks)   
AIPLANS 2021 · Gwonsoo Che
Tue 1:15 p.m. - 1:25 p.m.
Spotlight Talks from Authors 2 (Spotlight Talk)
AIPLANS 2021 · Hugo Paquet
Tue 1:25 p.m. - 1:35 p.m.
Spotlight Talks from Authors 3 (Spotlight Talks)
AIPLANS 2021 · Giri Krishnan
Tue 1:35 p.m. - 1:45 p.m.
Spotlight Talks from Authors 4 (Spotlight Talk)
AIPLANS 2021 · Eirini V. Pandi
Tue 1:45 p.m. - 1:55 p.m.
Spotlight Talks from Authors 5 (Spotlight Talk)
AIPLANS 2021 · Róbert Csordás
Tue 2:00 p.m. - 3:00 p.m.
Poster Session  link » AIPLANS 2021
[ OpenReview [ Visit Poster at Spot C5 in Virtual World ]  link »

Optionally typed dynamic languages can permit multiple valid type assignments. When this happens, developers can prefer one valid type assignment over another because it better reflects how they think about the program and the problem it solves. Natural type inference (NTI) uses natural language text within source code, such as identifiers, to help choose valid programming language types. A growing body of techniques has been proposed for NTI. These techniques predict types; they seek to return natural type assignments (assignments that reflect developer preferences) while striving for correctness. They are empirically effective, but they are not sound by construction: they do not leverage programming language theory to formalize their algorithms and show correctness and termination. Filling this foundational gap is the purpose of this paper. We are the first to present a detailed algorithm for NTI that is validated with theorems and proofs. Valid type assignments obey logical constraints arising from type rules; natural type assignments obey natural constraints arising from the natural language text associated with a variable and its uses.The core intuition of this work is that logical and natural constraints can interact to speed finding a type valuation that 1. type checks (satisfies the logical constraints) and 2. is most natural.We formulate NTI as a joint optimization problem. To do this, we define a numerical relaxation over boolean logical constraints that give us a condition that we treat as a hard constraint, while simultaneously we minimize distance from natural constraints, which we treat as soft constraints for our optimization problem. Our main result, the first formal proof of soundness for natural type inference, is that our algorithm always terminates, either with an error or with a tuple that is guaranteed to be a type signature for its input.

Eirini V. Pandi · Earl Barr · Andrew Gordon · Charles Sutton
[ OpenReview [ Visit Poster at Spot C4 in Virtual World ]  link »

Recent works have shown the incredible promise of using neural networks for the task of program synthesis from input-output examples. In this paper, we propose using Transformer-based architectures as a baseline for the program synthesis task on the Karel dataset. Specifically, we leverage DistillGPT as our decoder model to perform program synthesis for the Karel DSL. We show that changing the model architecture from an LSTM to a transformer based architecture, we are able to significantly improve on supervised learning approaches obtaining a top-1 generalization accuracy of 82.4%. Further, applying execution guided search on the output beams of the model increases the accuracy of our approach to 89.64%.

Abhay Garg · Anand Sriraman · Shirish Karande
[ OpenReview [ Visit Poster at Spot C3 in Virtual World ]  link »

This paper explores the capabilities of current transformer-based language models for program evaluation of simple functional programming languages. We introduce a new program generation mechanism that allows control over syntactic sugar for semantically equivalent programs. T5 experiments reveal that neural functional program evaluation performs surprisingly well, achieving high 90% exact program match scores for most in-distribution and out-of-distribution tests. We present and evaluate on three datasets to study generalization abilities that are specific to functional programs based on: type, function composition, and reduction steps.

Torsten Scholak · Jonathan Pilault
[ OpenReview [ Visit Poster at Spot C2 in Virtual World ]  link »

We present our current progress towards a metaprogramming framework for tensor expressions embedded in Haskell; the system offers a high-level syntax for dimension-annotated linear algebra, and generates specialized source code corresponding to the input expression.

Marco Zocca
[ OpenReview [ Visit Poster at Spot C1 in Virtual World ]  link »

We study the problem of learning verifiably safe parameters for programs that use neural networks as well as symbolic, human-written code. Such neurosymbolic programs arise in many safety-critical domains. However, because they need not be differentiable, they cannot be learned using existing approaches to integrating learning and verification. Our method, Differentiable Symbolic Execution (DSE), learns such programs by sampling code paths using symbolic execution, constructing gradients of a worst-case ``safety loss'' along these paths, and then backpropagating these gradients through program operations using a generalization of the reinforce estimator. We evaluate the method on a mix of synthetic tasks and real-world control and navigation benchmarks. Our experiments show that DSE significantly outperforms the state-of-the-art DiffAI method on these tasks.

Chenxi Yang · Swarat Chaudhuri
[ OpenReview [ Visit Poster at Spot C0 in Virtual World ]  link »

Program synthesis from natural language descriptions is a challenging task. This paper explores two variants of transformer models for the task of program synthesis and showcase higher performance than the existing SOTA models. Through the end, we also discuss the differences in learned representation in these two variants. We demonstrate that the vanilla transformer model has a higher capacity to memorize the training data as compared to the other variant.

Mrinal None Anand · Mayank Singh
[ OpenReview [ Visit Poster at Spot B6 in Virtual World ]  link »

Differentiable methods to learn rules (logic programs) have the potential to integrate the interpretability, transferability and low data requirements of inductive logic programming with the noise tolerance of non-symbolic learning. While negation is an essential component of reasoning, incorporating it into a logic programming framework poses several problems (hence its central place in the logic programming and nonmonotonic reasoning communities). Current implementations of differentiable rule learners either exclude negation entirely or else treat it only in passing. In this work, we introduce stratified negation into a differentiable inductive logic programming framework, and we demonstrate that the resulting system can learn recursive programs in which negation plays a central role. We include examples from multiple domains, e.g., arithmetic, graph, sets and lists.

Giri Krishnan · Ramyaa Ramyaa
[ OpenReview [ Visit Poster at Spot B5 in Virtual World ]  link »

The human ability to efficiently discover causal theories of their environments from observations is a feat of nature that remains elusive in machines. In this work, we attempt to make progress on this frontier by formulating the challenge of causal mechanism discovery from observed data as one of program synthesis. We focus on the domain of time-varying, Atari-like 2D grid worlds, and represent causal models in this domain using a programming language called Autumn. Discovering the causal structure underlying a sequence of observations is equivalent to identifying the program in the Autumn language that generates the observations. We introduce a novel program synthesis algorithm, called AutumnSynth, that approaches this synthesis challenge by integrating standard methods of synthesizing functions with an automata synthesis approach, used to discover the model's latent state. We evaluate our method on a suite of Autumn programs designed to express the richness of the domain, and our results signal the potential of our formulation.

Ria Das · Zenna Tavares · Josh Tenenbaum · Armando Solar-Lezama
[ OpenReview [ Visit Poster at Spot B4 in Virtual World ]  link »

We study the problem of synthesizing programs that include machine learning components such as deep neural networks (DNNs). We focus on statistical properties, which are properties expected to hold with high probability---e.g., that an image classification model correctly identifies people in images with high probability. We propose novel algorithms for sketching and synthesizing such programs by leveraging ideas from statistical learning theory to provide statistical soundness guarantees. We evaluate our approach on synthesizing list processing programs that include DNN components used to process image inputs, as well as case studies on image classification and on precision medicine. Our results demonstrate that our approach can be used to synthesize programs with probabilistic guarantees.

Osbert Bastani
[ OpenReview [ Visit Poster at Spot B3 in Virtual World ]  link »

One of the most challenging goals in designing intelligent systems is empowering them with the ability to synthesize programs from data. Namely, given specific requirements in the form of input/output pairs, the goal is to train a machine learning model to discover a program that satisfies those requirements. A recent class of methods exploits combinatorial search procedures and deep learning to learn compositional programs. However, they usually generate only toy programs using a domain-specific language that does not provide any high-level feature, such as function arguments, which reduces their applicability in real-world settings. We extend upon a state of the art model, AlphaNPI, by learning to generate functions that can accept arguments. This improvement will enable us to move closer to real computer programs. We showcase the potential of our approach by learning the Quicksort algorithm, showing how the ability to deal with arguments is crucial for learning and generalization.

Giovanni De Toni · Andrea Passerini
[ OpenReview [ Visit Poster at Spot B2 in Virtual World ]  link »

Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization. The situation is especially frustrating for algorithmic tasks, where they often fail to find intuitive solutions that can be simply expressed in terms of attention patterns. Here we propose two modifications to the Transformer architecture, copy gate and geometric attention, which facilitate learning such intuitive and interpretable solutions to algorithmic problems. Our novel Transformer, called Transformer Control Flow (TCF) achieves 100% length generalization accuracy on the classic compositional table lookup task. The resulting attention and gating patterns are interpretable, demonstrating that the model implements adaptive control flow.

Róbert Csordás · Kazuki Irie · Jürgen Schmidhuber
[ OpenReview [ Visit Poster at Spot B1 in Virtual World ]  link »

The resurgence of automatic program synthesis has been observed with the rise of deep learning. In this paper, we study the behaviour of the program synthesis model under adversarial settings. Our experiments suggest that these program synthesis models are prone to adversarial attacks. The proposed transformer model have higher adversarial performance than the current state-of-the-art program synthesis model. We specifically experiment with \textsc{AlgoLisp} DSL-based generative models and showcase the existence of significant dataset bias through different classes of adversarial examples.

Mrinal None Anand · Mayank Singh
[ OpenReview [ Visit Poster at Spot B0 in Virtual World ]  link »

Deep learning has had a significant impact on many fields. Recently, code-to-code neural models have been used in code translation, code refinement and decompilation. However, the question of whether these models can automate compilation has yet to be investigated. In this work, we explore neural compilation, building and evaluating Transformer models that learn how to produce x86 assembler from C code.Although preliminary results are relatively weak, we make our data, models and code publicly available to encourage further research in this area.

Jordi None Armengol-Estapé · Michael O'Boyle
[ OpenReview [ Visit Poster at Spot A6 in Virtual World ]  link »

We propose a novel framework called Quivr for synthesizing queries to identify events of interest in video data. For instance, Quivr can be used to identify instances of human driving behaviors such as lane changes or left turns, which are important for designing planning algorithms for autonomous cars. Our queries operate over object trajectories predicted by a deep object tracking model. Then, a query consists of regular expression operators used to compose underlying predicates (e.g., whether a car is in a lane), and selects a subset of trajectories. A key challenge is that queries are difficult for end users to develop: queries must reason about complex spatial and temporal patterns in object trajectories in order to select trajectories of interest, and predicates often include real-valued parameters (e.g., whether two cars are within a certain distance) that can be tedious to manually tune. Thus, Quivr automatically synthesizes queries given examples of trajectories that the query should match. To make the synthesis procedure efficient, we use overapproximations to prune invalid branches of the query search space, including using a quantitative variant of our query semantics to efficiently prune the search space over parameter values. We also propose two optimizations for speeding up the execution of our queries. Finally, we leverage an active learning strategy to disambiguate between multiple consistent candidate queries by collecting additional labels from the user. We evaluate Quivr on a benchmark of 11 tasks, and demonstrate that it can synthesize accurate queries for each task given just a few examples, and that our pruning strategy and optimizations substantially reduce synthesis time.

Stephen Mell · Favyen Bastani · Stephan Zdancewic · Osbert Bastani
[ OpenReview [ Visit Poster at Spot A5 in Virtual World ]  link »

We augment classic algorithms with learned components to adapt them to domains currently dominated by deep learning models. Two traditional sorting algorithms with learnable neural building blocks are applied to visual data with apriori unknown symbols and rules. The models are quickly and reliably trained end-to-end in a supervised setting. Our models learn symbol representations and generalise better than generic neural network models to longer input sequences.

Imanol Schlag · Jürgen Schmidhuber
[ OpenReview [ Visit Poster at Spot A4 in Virtual World ]  link »

We present a meta-algorithm for learning a posterior-inference algorithm for restricted probabilistic programs. Our meta-algorithm takes a training set of probabilistic programs that describe models with observations, and attempts to learn an efficient method for inferring the posterior of a similar program.A key feature of our approach is the use of what we call a white-box inference algorithm that analyses the given program sequentially using multiple neural networks to compute an approximate posterior.The parameters of these networks are learnt from a training set by our meta-algorithm.We empirically demonstrate that the learnt inference algorithm generalises well to programs that are new in terms of both parameters and model structures, and report cases where our approach achieves greater test-time efficiency than alternatives such as HMC.

Gwonsoo Che · Hongseok Yang
[ OpenReview [ Visit Poster at Spot A3 in Virtual World ]  link »

Deep learning and symbolic reasoning are complementary techniques for an intelligent system. However, principled combinations of these techniques are typically limited in scalability, rendering them ill-suited for real-world applications. We propose Scallop, a system that builds upon probabilistic deductive databases, to bridge this gap. On synthetic tasks involving mathematical and logical reasoning, Scallop scales significantly better without sacrificing accuracy compared to DeepProbLog, a principled neural logic programming approach. Scallop also scales to a real-world Visual Question Answering (VQA) benchmark that requires multi-hop reasoning, achieving 84.22% accuracy and outperforming two VQA-tailored models based on Neural Module Networks and transformers by 12.42% and 21.66% respectively.

Jiani Huang · Ziyang Li · Binghong Chen · Karan Samel · Mayur Naik · Le Song · Xujie Si
[ OpenReview [ Visit Poster at Spot A2 in Virtual World ]  link »

We introduce LazyPPL, a prototype probabilistic programming library for Haskell. The library emphasises the clarifying power of types, and the connection between non-parametric, stochastic processes and lazy (call by need) evaluation. We illustrate the power of the language with natural specifications of infinite structures including Poisson point processes, Gaussian processes, and Dirichlet Process clustering.

Hugo Paquet · Sam Staton
[ OpenReview [ Visit Poster at Spot A1 in Virtual World ]  link »

Automated Theorem Provers (ATPs) are widely used for the verification of logical statements. Explainability is one of the key advantages of ATPs: providing an expert readable proof path which shows the inference steps taken to conclude correctness. Conversely, Neuro-Symbolic Networks (NSNs) that perform theorem proving, do not have this capability. We propose a proof-tracing and filtering algorithm to provide explainable reasoning in the case of Logical Neural Networks (LNNs), a special type of Neural-Theorem Prover (NTP).

Thabang Lebese · Ndivhuwo Makondo · Cristina Cornelio · Naweed A Khan
[ OpenReview [ Visit Poster at Spot A0 in Virtual World ]  link »

Neural networks are becoming increasingly ubiquitous in a wide range of use cases. A primary hurdle in deploying neural networks in many scenarios is the tedious and difficult neural network architectural design process, which was reliant on expert knowledge and iterative design. Neural Architecture Search (NAS) reduces the human effort required for design, but still has considerable resource requirements and is extremely slow. To address the inefficiencies of conventional NAS, Zero-Shot NAS is a new paradigm, which introduces zero shot neural architecture scoring metrics (NASMs) to identify good neural network designs without training them. While applying Zero Shot NASMs is cheap and requires no training resources, we identify that there is a lack of NASMs that generalize well across neural architecture design spaces. In this paper, we present a program representation for NASMs and automate its search with genetic programming. We discover effective NASMs for Image Classification as well as Automatic Speech Recognition. We believe that our work indicates a new direction for NASM design and can greatly benefit from recent advances in program synthesis.

Yash Akhauri · Juan Munoz · Ravishankar Iyer · Nilesh Jain

Author Information

Breandan Considine (McGill University)
Disha Shrivastava (Mila, University of Montreal)
David Yu-Tung Hui (Mila)
Chin-Wei Huang (Mila)
Shawn Tan (Mila)
Xujie Si (McGill University & Mila)
Prakash Panangaden (McGill University, Montreal)
Guy Van den Broeck (UCLA)

I am an Assistant Professor and Samueli Fellow at UCLA, in the Computer Science Department, where I direct the Statistical and Relational Artificial Intelligence (StarAI) lab. My research interests are in Machine Learning (Statistical Relational Learning, Tractable Learning), Knowledge Representation and Reasoning (Graphical Models, Lifted Probabilistic Inference, Knowledge Compilation), Applications of Probabilistic Reasoning and Learning (Probabilistic Programming, Probabilistic Databases), and Artificial Intelligence in general.

Daniel Tarlow (Microsoft Research Cambridge)

More from the Same Authors