Timezone: »
Small datasets are ubiquitous in drug discovery as data generation is expensive and can be restricted for ethical reasons (e.g. in vivo experiments). A widely applied technique in early drug discovery to identify novel active molecules against a protein target is modelling quantitative structure-activity relationships (QSAR). It is known to be extremely challenging, as available measurements of compound activities range in the low dozens or hundreds. However, many such related datasets exist, each with a small number of datapoints, opening up the opportunity for few-shot learning after pre-training on a substantially larger corpus of data. At the same time, many few-shot learning methods are currently evaluated in the computer-vision domain. We propose that expansion into a new application, as well as the possibility to use explicitly graph-structured data, will drive exciting progress in few-shot learning. Here, we provide a few-shot learning dataset (FS-Mol) and complementary benchmarking procedure. We define a set of tasks on which few-shot learning methods can be evaluated, with a separate set of tasks for use in pre-training. In addition, we implement and evaluate a number of existing single-task, multi-task, and meta-learning approaches as baselines for the community. We hope that our dataset, support code release, and baselines will encourage future work on this extremely challenging new domain for few-shot learning.
Author Information
Megan Stanley (Microsoft Research)
John Bronskill (University of Cambridge)
Krzysztof Maziarz (Microsoft Research)
Hubert Misztela (AI Innovation Center, Novartis)
Jessica Lanini (Novartis)
Marwin Segler (WWU Münster / MSR)
Nadine Schneider (Novartis)
Marc Brockschmidt (Microsoft Research)
More from the Same Authors
-
2022 : Re-Evaluating Chemical Synthesis Planning Algorithms »
Austin Tripp · Krzysztof Maziarz · Sarah Lewis · Guoqing Liu · Marwin Segler -
2022 : Contextual Squeeze-and-Excitation »
Massimiliano Patacchiola · John Bronskill · Aliaksandra Shysheya · Katja Hofmann · Sebastian Nowozin · Richard Turner -
2022 : FiT: Parameter Efficient Few-shot Transfer Learning »
Aliaksandra Shysheya · John Bronskill · Massimiliano Patacchiola · Sebastian Nowozin · Richard Turner -
2022 : Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning Few-Shot Meta-Learners »
Elre Oldewage · John Bronskill · Richard Turner -
2022 Poster: Contextual Squeeze-and-Excitation for Efficient Few-Shot Image Classification »
Massimiliano Patacchiola · John Bronskill · Aliaksandra Shysheya · Katja Hofmann · Sebastian Nowozin · Richard Turner -
2021 Poster: Self-Supervised Bug Detection and Repair »
Miltiadis Allamanis · Henry Jackson-Flux · Marc Brockschmidt -
2021 Poster: Memory Efficient Meta-Learning with Large Images »
John Bronskill · Daniela Massiceti · Massimiliano Patacchiola · Katja Hofmann · Sebastian Nowozin · Richard Turner -
2020 : Panel »
Alan Aspuru-Guzik · Jennifer Listgarten · Klaus-Robert Müller · Nadine Schneider -
2020 : Invited Talk: Nadine Schneider -Real-world application of ML in drug discovery »
Nadine Schneider -
2019 : Poster Session #1 »
Adarsh Jamadandi · Sophia Sanborn · Huaxiu Yao · Chen Cai · Yu Chen · Jean-Marc Andreoli · Niklas Stoehr · Shih-Yang Su · Tony Duan · Fábio Ferreira · Davide Belli · Amit Boyarski · Ze Ye · Elahe Ghalebi · Arindam Sarkar · MAHMOUD KHADEMI · Evgeniy Faerman · Joey Bose · Jiaqi Ma · Lin Meng · Seyed Mehran Kazemi · Guangtao Wang · Tong Wu · Yuexin Wu · Chaitanya K. Joshi · Marc Brockschmidt · Daniele Zambon · Colin Graber · Rafaël Van Belle · Osman Asif Malik · Xavier Glorot · Mario Krenn · Chris Cameron · Binxuan Huang · George Stoica · Alexia Toumpa -
2019 Poster: Program Synthesis and Semantic Parsing with Learned Code Idioms »
Richard Shin · Miltiadis Allamanis · Marc Brockschmidt · Oleksandr Polozov -
2019 Poster: Fast and Flexible Multi-Task Classification using Conditional Neural Adaptive Processes »
James Requeima · Jonathan Gordon · John Bronskill · Sebastian Nowozin · Richard Turner -
2019 Spotlight: Fast and Flexible Multi-Task Classification using Conditional Neural Adaptive Processes »
James Requeima · Jonathan Gordon · John Bronskill · Sebastian Nowozin · Richard Turner -
2019 Poster: A Model to Search for Synthesizable Molecules »
John Bradshaw · Brooks Paige · Matt Kusner · Marwin Segler · José Miguel Hernández-Lobato -
2018 Poster: Constrained Graph Variational Autoencoders for Molecule Design »
Qi Liu · Miltiadis Allamanis · Marc Brockschmidt · Alexander Gaunt -
2017 : Planning Chemical Syntheses with Neural Networks and Monte Carlo Tree Search »
Marwin Segler