Predicting the physical interaction of proteins is a cornerstone problem in computational biology. New learning-based algorithms are typically trained end-to-end on protein structures extracted from the Protein Data Bank. However, these training datasets tend to be large and difficult to use for prototyping and, unlike image or natural language datasets, they are not easily interpretable by non-experts. In this paper we propose Dock2D-IP and Dock2D-FI, two toy datasets that can be used to select algorithms predicting protein-protein interactions (or any other type of molecular interactions). Using two-dimensional shapes as input, each example from Dock2D-FI describes the fact of interaction (FI) between two shapes and each example from Dock2D-IP describes the interaction pose (IP) of two shapes known to interact. We propose baselines that represent different approaches to the problem and demonstrate the potential for transfer learning across the IP prediction and FI prediction tasks.
Georgy Derevyanko (Concordia University)
Sid Bhadra-Lobo (Rutgers University)
Guillaume Lamoureux (Rutgers University)
More from the Same Authors
2018 : Contributed Work »
Thaer Moustafa Dieb · Aditya Balu · Amir H. Khasahmadi · Viraj Shah · Boris Knyazev · Payel Das · Garrett Goh · Georgy Derevyanko · Gianni De Fabritiis · Reiko Hagawa · John Ingraham · David Belanger · Jialin Song · Kim Nicoli · Miha Skalic · Michelle Wu · Niklas Gebauer · Peter Bjørn Jørgensen · Ryan-Rhys Griffiths · Shengchao Liu · Sheshera Mysore · Hai Leong Chieu · Philippe Schwaller · Bart Olsthoorn · Bianca-Cristina Cristescu · Wei-Cheng Tseng · Seongok Ryu · Iddo Drori · Kevin Yang · Soumya Sanyal · Zois Boukouvalas · Rishi Bedi · Arindam Paul · Sambuddha Ghosal · Daniil Bash · Clyde Fare · Zekun Ren · Ali Oskooei · Minn Xuan Wong · Paul Sinz · Théophile Gaudin · Wengong Jin · Paul Leu