Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Machine Learning in Structural Biology Workshop

Protein language models learn evolutionary statistics of interacting sequence motifs

Zhidian Zhang · Hannah Wayment-Steele · Garyk Brixi · Matteo Dal Peraro · Dorothee Kern · Sergey Ovchinnikov


Abstract:

Protein language models (pLMs) have emerged as potent tools for predicting protein structures and designing proteins, yet it is unknown to what degree these models actually understand the inherent biophysics of protein structure. Motivated by a discovery that pLMs erroneously predict non-physical structure fragments for protein isoforms, we investigated the nature of sequence context needed for contact predictions in ESM2 by developing a "categorical Jacobian" approach, allowing for a completely unsupervised way of assessing coevolutionary signal stored in models, as well as by artificially modifying sequences. We found that pLMs make contact predictions conditioned on sequence motifs and the relative linear distance between segment pairs. Our investigation highlights the limitations of current pLMs and underscores the importance of understanding the underlying mechanisms of these models.

Chat is not available.