Timezone: »
Epitope vaccines are a promising direction to enable precision treatment for cancer, autoimmune diseases, and allergies. Effectively designing such vaccines requires accurate prediction of proteasomal cleavage in order to ensure that the epitopes in the vaccine are presented to T cells by the major histocompatibility complex (MHC). While direct identification of proteasomal cleavage in vitro is cumbersome and low throughput, it is possible to implicitly infer cleavage events from the termini of MHC-presented epitopes, which can be detected in large amounts thanks to recent advances in high-throughput MHC ligandomics. Inferring cleavage events in such a way provides an inherently noisy signal which can be tackled with new developments in the field of deep learning that supposedly make it possible to learn predictors from noisy labels. Inspired by such innovations, we sought to modernize proteasomal cleavage predictors by benchmarking a wide range of recent methods, including LSTMs, transformers, CNNs, and denoising methods, on a recently introduced cleavage dataset. We found that increasing model scale and complexity appeared to deliver limited performance gains, as several methods reached about 88.5\% AUC on C-terminal and 79.5\% AUC on N-terminal cleavage prediction. This suggests that the noise and/or complexity of proteasomal cleavage and the subsequent biological processes of the antigen processing pathway are the major limiting factors for predictive performance rather than the specific modeling approach used. While biological complexity can be tackled by more data and better models, noise and randomness inherently limit the maximum achievable predictive performance. All our datasets and experiments are available at https://anonymous.4open.science/r/cleavage_prediction-E8FD.
Author Information
Ingo Ziegler (Ludwig-Maximilians-Universität München)
Initially interested in the finance industry, I started my education on the Bachelor's level with a business degree in Finance & Accounting. After various internships in the areas of corporate finance and audit, I become more and more passionate about quantitative areas such as machine learning. Completing a machine learning and data science degree at EPFL's Extension School helped me bridge the gap between my business background and a full undergraduate computer science education. Currently, I'm a Master's student in computational linguistics at LMU Munich, focusing on representation and transfer learning.
Bolei Ma (Ludwig-Maximilians-Universität München)
Ercong Nie (Ludwig-Maximilians-Universität München)
Hi, I am Ercong. Currently, I am a first-year PhD student at Center for Information and Language Processing (CIS), LMU of Munich in computational linguistics / natural language processing. Previously, I obtained my M.Sc. degree in computational linguistics and informatics at LMU of Munich. I finished my bachelor study in German linguistics and Finance at Shanghai Jiao Tong University (SJTU). I also spent one semester at University of Heidelberg for an exchange study in German linguistics. My research interests are Cross-lingual transfer learning, Zero- and few-shot learning, Multilinguality.
Bernd Bischl (LMU)
David Rügamer (LMU Munich)
Benjamin Schubert (Institute of computational biology)
Emilio Dorigatti (Ludwig Maximilian Universitaet Muenchen)
More from the Same Authors
-
2021 : Survival-oriented embeddings for improving accessibility to complex data structures »
Tobias Weber · Bernd Bischl · David Ruegamer -
2021 : Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation »
Tobias Weber · Michael Ingrisch · Bernd Bischl · David Rügamer -
2021 : Identifying the atmospheric drivers of drought and heat using a smoothed deep learning approach »
David Rügamer -
2021 : Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation »
Tobias Weber · Michael Ingrisch · Bernd Bischl · David Rügamer -
2022 : meTCRs - Learning a metric for T-cell receptors »
Felix Drost · Lennard Schiefelbein · Benjamin Schubert -
2022 : Transformer Model for Genome Sequence Analysis »
Noah Hurmer · Xiao-Yin To · Martin Binder · Hüseyin Anil Gündüz · Philipp Münch · René Mreches · Alice McHardy · Bernd Bischl · Mina Rezaei -
2022 : Uncertainty-aware predictive modeling for fair data-driven decisions »
Patrick Kaiser · Christoph Kern · David Rügamer -
2022 Poster: FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear Modulation »
Mehmet Ozgur Turkoglu · Alexander Becker · Hüseyin Anil Gündüz · Mina Rezaei · Bernd Bischl · Rodrigo Caye Daudt · Stefano D'Aronco · Jan Wegner · Konrad Schindler