Skip to yearly menu bar Skip to main content

Workshop: AI for Science: Progress and Promises

Automated Protein Function Description for Novel Class Discovery

Meet Barot · Vladimir Gligorijevic · Richard Bonneau · Kyunghyun Cho

Keywords: [ Deep Learning ] [ transformers ] [ protein function prediction ] [ Neural Machine Translation ] [ Text generation ]


Knowledge of protein function is necessary for understanding biological systems, but the discovery of new sequences from high-throughput sequencing technologies far outpaces their functional characterization. Beyond the problem of assigning newly sequenced proteins to known functions, a more challenging issue is discovering novel protein functions. The space of possible functions becomes unlimited when considering designed proteins. Protein function prediction, as it is framed in the case of Gene Ontology term prediction, is a multilabel classification problem with a hierarchical label space. However, this framing does not provide guiding principles for discovering completely novel functions.Here we propose a neural machine translation model in order to generate descriptions of protein functions in natural language. In this way, instead of making predictions in the limited label space, our model generates descriptions in the language space, and thus is capable of generating novel functional descriptions. Given the novelty of our approach, we design metrics to evaluate the performance of our model: correctness, specificity and robustness. We provide results of our model in the zero-shot classification setting, scoring functional descriptions that the model has not seen before for proteins that have limited homology to those in the training set. Finally, we show generated function descriptions compared to ground truth descriptions for qualitative evaluation.

Chat is not available.