DINO: dynamics-informed dataset to overcome the limitations of static molecular data in AI-driven drug discovery
Abstract
Computational drug discovery is limited by its reliance on static molecular structures, leading AI models to suffer from state bias, dynamics blind spots, and unrealistic conformational sampling. As a result, generated candidates often lack the biological plausibility needed for molecules to be a drug and require costly experimental validation. To address this, we propose DINO, a dataset designed to embed molecular motion into AI-driven design. DINO integrates experimental and synthetic molecular dynamics data across membrane proteins, antibodies, nucleic acids, small molecules, and complexes, spanning atom- to system-level motions. By capturing biophysical conformational ensembles, binding energetics, and functional kinetics, the dataset provides biologically meaningful representations of molecular flexibility and function. This resource will support prediction of binding affinity, specificity, stability, and disorder, generation of flexible biomolecules and realistic small-molecule binders, and hybrid tasks such as incorporating static structural models into dynamic landscapes. By grounding molecular design in thermodynamic principles, DINO enables AI models to move beyond static assumptions and generate biochemically plausible candidates with higher therapeutic potential.