Pearl: A Foundation Model Advancing Small Molecule Drug Discovery
Abstract
Accelerating small molecule drug discovery with AI + Physics fundamentally depends on accurately predicting how potential drug candidates bind to target proteins in 3D space. Current structure prediction methods are limited by the severely restricted and biased experimental data available in the Protein Data Bank (PDB) and their propensity to generate physically invalid poses. This talk introduces PEARL (Placing Every Atom in the Right Location), our generative foundation model that overcomes this low-data scientific regime by leveraging large-scale synthetic training data and an SO(3)-equivariant diffusion module to enforce core geometric principles, improving generalization and sample efficiency. PEARL establishes the new state-of-the-art in protein-ligand structure prediction, demonstrating up to a 14.5% relative improvement on public benchmarks for generating accurate and physically valid poses (RMSD<2Å and PB-valid). The model’s novel approach to inference-time conditioning makes it substantially more useful for drug discovery programs, allowing users to leverage auxiliary structural information in a controlled manner. On an internal benchmark of protein-ligand systems relevant for small-molecule drug discovery programs, Pearl delivers nearly a 4-fold relative improvement over comparable baselines at the stricter, medicinal chemistry-relevant RMSD<1Å threshold. This talk is intended for ML researchers interested in how generative models are being applied to structural biology. Attendees will leave the talk with a better understanding of applications for diffusion models in drug discovery, and examples of ongoing problems at the forefront of the field where there is ample room for future research.