Timezone: »

DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
Gabriele Corso · Hannes Stärk · Bowen Jing · Regina Barzilay · Tommi Jaakkola

Predicting the binding structure of a small molecule ligand to a protein---a task known as molecular docking---is critical to drug design. Recent deep learning methods that treat docking as a regression problem have decreased runtime compared to traditional search-based methods but have yet to offer substantial improvements in accuracy. We instead frame molecular docking as a generative modeling problem and develop DiffDock, a diffusion generative model over the non-Euclidean manifold of ligand poses. To do so, we map this manifold to the product space of the degrees of freedom (translational, rotational, and torsional) involved in docking and develop an efficient diffusion process on this space. Empirically, DiffDock obtains a 38% top-1 success rate (RMSD<2Å) on PDBBind, significantly outperforming the previous state-of-the-art of traditional docking (23%) and deep learning (20%) methods. Moreover, DiffDock has fast inference times and provides confidence estimates with high selective accuracy.

Author Information

Gabriele Corso (MIT)
Hannes Stärk (MIT)
Hannes Stärk

I am a first-year PhD student at MIT in the CS and AI Laboratory (CSAIL) co-advised by Tommi Jaakkola and Regina Barzilay. I work on geometric deep learning and physics-inspired ML and applications in molecular biology and other physical systems.

Bowen Jing (Massachusetts Institute of Technology)
Regina Barzilay (Massachusetts Institute of Technology)
Tommi Jaakkola (MIT)

Tommi Jaakkola is a professor of Electrical Engineering and Computer Science at MIT. He received an M.Sc. degree in theoretical physics from Helsinki University of Technology, and Ph.D. from MIT in computational neuroscience. Following a Sloan postdoctoral fellowship in computational molecular biology, he joined the MIT faculty in 1998. His research interests include statistical inference, graphical models, and large scale modern estimation problems with predominantly incomplete data.

More from the Same Authors