Workshop
Differentiable Programming Workshop
Ludger Paehler · William Moses · Maria I Gorinova · Assefaw H. Gebremedhin · Jan Hueckelheim · Sri Hari Krishna Narayanan
Differentiable programming allows for automatically computing derivatives of functions within a highlevel language. It has become increasingly popular within the machine learning (ML) community: differentiable programming has been used within backpropagation of neural networks, probabilistic programming, and Bayesian inference. Fundamentally, differentiable programming frameworks empower machine learning and its applications: the availability of efficient and composable automatic differentiation (AD) tools has led to advances in optimization, differentiable simulators, engineering, and science.
While AD tools have greatly increased the productivity of ML scientists and practitioners, many problems remain unsolved. Crucially, there is little communication between the broad group of AD users, the programming languages researchers, and the differentiable programming developers, resulting in them working in isolation. We propose a Differentiable Programming workshop as a forum to narrow the gaps between differentiable and probabilistic languages design, efficient automatic differentiation engines and higherlevel applications of differentiable programming. We hope this workshop will harness a closer collaboration between language designers and domain scientists by bringing together a diverse part of the differentiable programming community including people working on core automatic differentiation tools, higher level frameworks that rely upon AD (such as probabilistic programming and differentiable simulators), and applications that use differentiable programs to solve scientific problems.
The explicit goals of the workshop are to:
1. Foster closer collaboration and synergies between the individual communities;
2. Evaluate the merits of differentiable design constructs and the impact they have on the algorithm design space and usability of the language;
3. Highlight differentiable techniques of individual domains, and the potential they hold for other fields.
Schedule
Mon 6:00 a.m.  6:05 a.m.

Welcome
(
Short Introduction & Welcome to the Workshop
)
SlidesLive Video 
🔗 
Mon 6:05 a.m.  6:35 a.m.

ParallelFriendly Automatic Differentiation in Dex and JAX
(
Invited Talk
)
SlidesLive Video 
Adam Paszke 🔗 
Mon 6:35 a.m.  7:05 a.m.

SYMPAIS: SYMbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis
(
Invited Talk
)
SlidesLive Video 
Yuan Zhou 🔗 
Mon 7:05 a.m.  7:20 a.m.

Differentiable Scripting
(
Oral
)
link
SlidesLive Video In Computational Science, Engineering and Finance (CSEF)scripts typically serve as the ``glue'' between potentially highlycomplex and computationally expensive external subprograms.Differentiability of the resulting programs turns out to beessential in the context of derivativebased methods for error analysis, uncertainty quantification, optimization or training of surrogates.We argue that it shouldbe enforced by the scripting languageitself through exclusive support of differentiable (smoothed) externalsubprograms and differentiable intrinsics combined withprohibition of nondifferentiable branches in the data flow.Illustration is provided by a prototype adjoint code compiler for asimple Pythonlike scripting language. 
Uwe Naumann 🔗 
Mon 7:20 a.m.  7:35 a.m.

A research framework for writing differentiable PDE discretizations in JAX
(
Oral
)
link
SlidesLive Video Differentiable simulators are an emerging concept with applications in several fields, from reinforcement learning to optimal control. Their distinguishing feature is the ability to calculate analytic gradients with respect to the input parameters. Like neural networks, which are constructed by composing several building blocks called layers, a simulation often requires computing the output of an operator that can itself be decomposed into elementary units chained together. While each layer of a neural network represents a specific discrete operation, the same operator can have multiple representations, depending on the discretization employed and the research question that needs to be addressed. Here, we propose a simple design pattern to construct a library of differentiable operators and discretizations, by representing operators as mappings between families of continuous functions, parametrized by finite vectors. We demonstrate the approach on an acoustic optimization problem, where the Helmholtz equation is discretized using Fourier spectral methods, and differentiability is demonstrated using gradient descent to optimize the speed of sound of an acoustic lens. 
Antonio Stanziola · Simon Arridge 🔗 
Mon 7:35 a.m.  7:50 a.m.

Break
(
Break
)

🔗 
Mon 7:50 a.m.  8:20 a.m.

Differentiable Programming in Molecular Physics
(
Invited Talk
)
SlidesLive Video 
Frank Noe 🔗 
Mon 8:20 a.m.  8:50 a.m.

Diffractor.jl: High Level, High Performance AD for Julia
(
Invited Talk
)
SlidesLive Video 
Keno Fischer 🔗 
Mon 8:50 a.m.  9:05 a.m.

Equinox: neural networks in JAX via callable PyTrees and filtered transformations
(
Oral
)
link
SlidesLive Video JAX and PyTorch are two popular Python autodifferentiation frameworks. JAX is based around pure functions and functional programming. PyTorch has popularised the use of an objectoriented (OO) classbased syntax for defining parameterised functions, such as neural networks. That this seems like a fundamental difference means current libraries for building parameterised functions in JAX have either rejected the OO approach entirely (Stax) or have introduced OOtofunctional transformations, multiple new abstractions, and been limited in the extent to which they integrate with JAX (Flax, Haiku, Objax). Either way this OO/functional difference has been a source of tension. Here, we introduce 
Patrick Kidger 🔗 
Mon 9:05 a.m.  9:20 a.m.

A fullydifferentiable compressible highorder computational fluid dynamics solver
(
Oral
)
link
SlidesLive Video Fluid flows are omnipresent in nature and engineering disciplines.The reliable computation of fluids has been a longlasting challenge due to nonlinear interactions over multiple spatiotemporal scales.The compressible NavierStokes equations govern compressible flows and allow for complex phenomena like turbulence and shocks.Despite tremendous progress in hardware and software, capturing the smallest lengthscales in fluid flows still introduces prohibitive computational cost for reallife applications.We are currently witnessing a paradigm shift towards machine learning supported design of numerical schemes as a means to tackle aforementioned problem.While prior work has explored differentiable algorithms for one or twodimensional incompressible fluid flows, we present a fullydifferentiable framework for the computation of compressible fluid flows using highorder stateoftheart numerical methods.Firstly, we demonstrate the efficiency of our solver by computing classical two and threedimensional test cases, including strong shocks and transition to turbulence.Secondly, and more importantly, our framework allows for endtoend optimization to improve existing numerical schemes inside computational fluid dynamics algorithms.In particular, we are using neural networks to substitute a conventional numerical flux function. 
Deniz Bezgin 🔗 
Mon 9:20 a.m.  9:25 a.m.

Short Break
(
Break
)

🔗 
Mon 9:25 a.m.  10:40 a.m.

Poster Session
(
Poster Session
)

🔗 
Mon 9:25 a.m.  10:40 a.m.

Extended Abstract – Enzyme.jl: Low levelautodifferentiation meets highlevel language ( Poster ) link  Valentin Churavy 🔗 
Mon 9:25 a.m.  10:40 a.m.

GPU Accelerated Automatic Differentiation with Clad
(
Poster
)
link
Automatic Differentiation (AD) is a fundamental method that empowers computational algorithms across a range of fields, including Machine Learning, Robotics and High Energy Physics. We present methods enabling wellbehaved C++ functions to be automatically differentiated on a GPU without need of code modification. This work brings forth the potential of a new layer of optimisation and a proportional speed up when gradients. The aim of this effort is to provide a tool for AD that can be easily integrated into existing frameworks as a compiler plugin extending the Clang compiler. It can be used interactively, as a Jupyter kernel extension, or as a plugin extending an interactive environment. It will provide researchers with the means to reuse preexisting models and have their workloads scheduled on parallel processors without the need to optimise their computational kernels. 
Vassil Vassilev · David Lange 🔗 
Mon 9:25 a.m.  10:40 a.m.

Unbiased Reparametrisation Gradient via Smoothing and Diagonalisation
(
Poster
)
link
It is wellknown that the reparametrisation gradient estimator for nondifferentiable models is biased. To formalise the problem, we consider a variant of the simplytyped lambda calculus which supports the reparametrisation of arguments. We endow this language with a denotational semantics based on the cartesian closed category of Frölicher spaces (parameterised by a smoothing accuracy), which generalise smooth manifolds. Finally, we apply the standard reparametrisation gradient to the smoothed model and show that by enhancing the accuracy of the smoothing in a diagonalisation fashion we converge to a critical point of the original optimisation problem. 
Dominik Wagner · Luke Ong 🔗 
Mon 9:25 a.m.  10:40 a.m.

Gradients of the Big Bang: Solving the EinsteinBoltzmann Equations with Automatic Differentiation
(
Poster
)
link
Our best estimates of the age, contents, and geometry of the Universe come from comparing predictions of the EinsteinBoltzmann (EB) equations with observations of galaxies and the afterglow of the Big Bang. Existing EB solvers are not differentiable, and Bayesian parameter estimation of these differential equation models are thus restricted to employing gradientfree inference algorithms. This becomes intractable in the highdimensional settings increasingly relevant for modern observations. Propagating derivatives through the numerical solution of these ordinary differential equations is tractable through automatic differentiation (AD). We are actively developing the first ADenabled EB solver, Bolt.jl, making use of the rich Julia ecosystem of AD tools. Beyond mitigating the cost of highdimensional inference, Bolt.jl opens the door to testing new cosmological physics against data at the level of terms in the EinsteinBoltzmann equations, using neural ODEs and physicsinformed neural networks (PINNs). 
James Sullivan 🔗 
Mon 9:25 a.m.  10:40 a.m.

Differentiable Parametric Optimization Approach to Power System Load Modeling
(
Poster
)
link
In this work, we propose a differentiable programming approach to datadriven modeling of distribution systems for electromechanical transient stability analysis. Our approach combines the traditional ZIP load model with a deep neural network formulated as a constrained nonlinear leastsquares problem. We will discuss the formulation, setup, and training of the proposed model as a differentiable program. Finally, we will compare and investigate the performance of this new load model and present the results on a mediumscale 350bus transmissiondistribution network. 
Jan Drgona · Andrew August · Elliott Skomski 🔗 
Mon 9:25 a.m.  10:40 a.m.

On automatic differentiation for the Matern covariance
(
Poster
)
link
To target challenges in differentiable optimization we analyze and propose strategies for derivatives of the Matérn kernel with respect to the smoothness parameter.This problem poses a challenge in Gaussian processes modelling due to the lack ofrobust derivatives of the modified Bessel function of second kind. In the currentwork we scrutinize the mathematical and numerical hurdles posed by the differentiation of special functions and provide a set of options. Special focus is givento a newly derived series expansion for the modified Bessel function of secondkind which yields highly accurate results using the complex step method and ispromising for classical AD implementations. 
Oana Marin · Paul Hovland 🔗 
Mon 9:25 a.m.  10:40 a.m.

Neural Differentiable Predictive Control
(
Poster
)
link
We present neural differentiable predictive control (DPC) method for learning constrained neural control policies for uncertain linear systems. DPC is formulated as a differentiable problem whose computational graph architecture is inspired by classical model predictive control (MPC) structure. In particular, the optimization of the neural control policy is based on automatic differentiation of the MPC loss function through a differentiable closedloop system dynamics model. We show that DPC can learn constrained neural control policies to stabilize systems with unstable dynamics, track timevarying references, and satisfy state and input constraints without the prior need of a supervisory MPC controller. 
Jan Drgona · Aaron Tuor · Draguna Vrabie 🔗 
Mon 9:25 a.m.  10:40 a.m.

AbstractDifferentiation.jl: BackendAgnostic Differentiable Programming in Julia
(
Poster
)
link
No single Automatic Differentiation (AD) system is the optimal choice for all problems. This means informed selection of an AD system and combinations can be a problemspecific variable that can greatly impact performance. In the Julia programming language, the major AD systems target the same input and thus in theory can compose. Hitherto, switching between AD packages in the Julia Language required endusers to familiarize themselves with the userfacing API of the respective packages. Furthermore, implementing a new, usable AD package required AD package developers to write boilerplate code to define convenience API functions for endusers. As a response to these issues, we present AbstractDifferentiation.jl for the automatized generation of an extensive, unified, userfacing API for any AD package. By splitting the complexity between AD users and AD developers, AD package developers only need to implement one or two primitive definitions to support various utilities for AD users like Jacobians, Hessians and lazy product operators from native primitives such as pullbacks or pushforwards, thus removing tedious  but so far inevitable  boilerplate code, and enabling the easy switching and composing between AD implementations for endusers. 
Frank Schäfer · Mohamed Tarek · Lyndon White · Christopher Rackauckas 🔗 
Mon 9:25 a.m.  10:40 a.m.

Aggregated type handling in AD tape implementations
(
Poster
)
link
The development of AD tools focuses mostly on handling floating point types in the target language. Taping optimizations in these tools mostly focus on specific operations like matrix vector products.Aggregated types like std::complex are usually handled by specifying the AD type as a template argument.This approach provides exact results, but prevents the use of expression templates.If AD tools are extended and specialized such that aggregated types can be added to the expression framework, then this will result in reduced memory utilization and improve the timing for applications where aggregated types such as complex number, matrix vector operations or layer operations in neural networks are used. Such an integration requires a reformulation of the stored data per expression and a rework of the tape evaluation process. In this paper we demonstrate the overhead of unhandled aggregated types in expression templates and provide basic ingredients for a tape implementation that supports arbitrary aggregated types for which the user has implemented some type traits. Finally, we demonstrate the advantages of aggregated type handling on a synthetic benchmark case. 
Max Sagebaum 🔗 
Mon 9:25 a.m.  10:40 a.m.

Backpropagation through Back substitution with a Backslash
(
Poster
)
link
We present a linear algebra formulation of backpropogation that serves as an alternative to the traditional approach.Using matrices allows the calculation of gradients given the availability of a generically written Gaussian elimination which is representedby the ``backslash" symbol. Backpropogation is often connected to the chain rule for multivariate calculus, but we propose that this may be seen as a distraction from the underlying algebraic structure.The implementation shows how generic linear algebra can allow operators as elements of matrices, and without rewriting of any code, the software carries through to completion giving the correct answer. We demonstrate in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, it is possibleto realize this abstraction in code. 
Ekin Akyürek · Alan Edelman · Bernie Wang 🔗 
Mon 10:40 a.m.  10:45 a.m.

Short Break
(
Break
)

🔗 
Mon 10:45 a.m.  11:15 a.m.

Learning from Data through the Lens of Ocean Models, Surrogates, and their Derivatives
(
Invited Talk
)
SlidesLive Video 
Patrick Heimbach 🔗 
Mon 11:15 a.m.  11:45 a.m.

Learnable Physics Models
(
Invited Talk
)
SlidesLive Video 
Karen Liu 🔗 
Mon 11:45 a.m.  12:00 p.m.

Escaping the abstraction: a foreign function interface for the Unified Form Language [UFL]
(
Oral
)
link
SlidesLive Video High level domain specific languages for the finite element method underpin high productivity programming environments for simulations based on partial differential equations (PDE) while employing automatic code generation to achieve high performance. However, a limitation of this approach is that it does not support operators that are not directly expressible in the vector calculus. This is critical in applications where PDEs are not enough to accurately describe the physical problem of interest. The use of deep learning techniques have become increasingly popular in filling this knowledge gap, for example to include features not represented in the differential equations, or closures for unresolved spatiotemporal scales. We introduce an interface within the Firedrake finite element system that enables a seamless interface with deep learning models. This new feature composes with the automatic differentiation capabilities of Firedrake, enabling the automated solution of inverse problems. Our implementation interfaces with PyTorch and can be extended to other machine learning libraries. The resulting framework supports complex models coupling PDEs and deep learning whilst maintaining separation of concerns between application scientists and software experts. 
Nacime Bouziani 🔗 
Mon 12:00 p.m.  12:15 p.m.

Towards Denotational Semantics of AD for HigherOrder, Recursive, Probabilistic Languages
(
Oral
)
link
SlidesLive Video Automatic differentiation (AD) aims to compute derivatives of userdeﬁned functions, but in Turingcomplete languages, this simple speciﬁcation does not fully capture AD’s behavior: AD sometimes disagrees with the true derivative of a differentiable program, and when AD is applied to nondifferentiable or effectful programs, it is unclear what guarantees (if any) hold of the resulting code. We study an expressive differentiable programming language, with piecewiseanalytic primitives, higherorder functions, and general recursion. Our main result is that even in this general setting, a version of Lee et al. [2020]’s correctness theorem (originally proven for a ﬁrstorder language without partiality or recursion) holds: all programs denote socalled ωPAP functions, and AD computes correct intensional derivatives of them. Mazza and Pagani [2021]’s recent theorem, that AD disagrees with the true derivative of a differentiable recursive program at a measurezero set of inputs, can be derived as a straightforward corollary of this fact. We also apply the framework to study probabilistic programs, and recover a recent result from Mak et al. [2021] via a novel denotational argument. 
Alexander Lew · Mathieu Huot · Vikash Mansinghka 🔗 
Mon 12:15 p.m.  12:30 p.m.

Break
(
Break
)

🔗 
Mon 12:30 p.m.  1:00 p.m.

Differentiable Programming for Protein Sequences and Structure
(
Invited Talk
)
SlidesLive Video 
Sergey Ovchinnikov 🔗 
Mon 1:00 p.m.  1:30 p.m.

Approximate High Performance Computing Guided by Automatic Differentiation
(
Invited Talk
)
SlidesLive Video 
Harshitha Menon Menon 🔗 
Mon 1:30 p.m.  1:45 p.m.

A Complete Axiomatization of Forward Differentiation
(
Oral
)
link
SlidesLive Video We give a complete decidable secondorder equational axiomatisation of the forward differentiation of smooth multivariate functions. Differentiation is expressed using the binding structures available in secondorder equational logic. The main mathematical theorem used is Severi’s multivariate Hermite interpolation theorem. 
Gordon Plotkin 🔗 
Mon 1:45 p.m.  2:00 p.m.

Generalizability of density functionals learned from differentiable programming on weakly correlated spinpolarized systems
(
Oral
)
link
SlidesLive Video KohnSham regularizer (KSR) is a machine learning approach that optimizes a physicsinformed exchangecorrelation functional within a differentiable KohnSham density functional theory framework. We evaluate the generalizability of KSR by training on atomic systems and testing on molecules at equilibrium. We propose a spinpolarized version of KSR with local, semilocal, and nonlocal approximations for the exchangecorrelation functional. The generalization error from our semilocal approximation is comparable to other differentiable approaches. Our nonlocal functional outperforms any existing machine learning functionals by predicting the groundstate energies of the test systems with a mean absolute error of 2.7 milliHartrees. 
Bhupalee Kalita · Ryan Pederson · Li Li · kieron burke 🔗 
Mon 2:00 p.m.  3:00 p.m.

Social
(
Social
)

🔗 