Timezone: »
Open genomic regions, being accessible to regulatory proteins, could act as the on/off switch or amplifier/attenuator of gene expression, and thus reflects the defining characteristics of cell types. Many previous models make predictions from the sequence to the regulatory region, but the interaction between regulatory regions and genes could be complex and differ between cell types. Moreover, current models usually only perform well on the cell types in the training set, which are not generalizable to data-scarce scenarios. In this work, we propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT. Specifically, we simultaneously take the 1d sequence of genome data and a 2d matrix of (transcription factors × regions) as the input, where three pre-training tasks are proposed to improve the robustness and generalizability of our model. We pre-train our model on the ATAC-seq dataset with 17 million gene sequences. We evaluate our GeneBERT on various downstream tasks, including promoter prediction, transaction factor binding sites prediction, disease risks estimation, and RNA-Splicing. Extensive experiments demonstrate the effectiveness of multi-modal and self-supervised pre-training for large-scale genome data.
Author Information
Shentong Mo (CMU)
Xi Fu (Columbia University)
Chenyang Hong (The Chinese University of Hong Kong)
Yizhen Chen (The Chinese University of Hong Kong)
Yuxuan Zheng (East China Normal University)
Xiangru Tang (Yale University)
Yanyan Lan (Tsinghua University, Tsinghua University)
Zhiqiang Shen (CMU)
Eric Xing (Petuum Inc. / Carnegie Mellon University)
More from the Same Authors
-
2021 : Simulated Annealing for Neural Architecture Search »
Shentong Mo · Jingfei Xia · Pinxu Ren -
2021 : Adaptive Fine-tuning for Vision and Language Pre-trained Models »
Shentong Mo · Jingfei Xia · Ihor Markevych -
2022 : Exploring Transformer Backbones for Heterogeneous Treatment Effect Estimation »
yifan zhang · Hanlin Zhang · Zachary Lipton · Li Erran Li · Eric Xing -
2023 Poster: Weakly-Supervised Audio-Visual Segmentation »
Shentong Mo · Bhiksha Raj -
2023 Poster: DiffComplete: Diffusion-based Generative 3D Shape Completion »
Ruihang Chu · Enze Xie · Shentong Mo · Zhenguo Li · Matthias Niessner · Chi-Wing Fu · Jiaya Jia -
2023 Poster: DrugCLIP: Contrasive Protein-Molecule Representation Learning for Virtual Screening »
Bowen Gao · Bo Qiang · Haichuan Tan · Yinjun Jia · Minsi Ren · Minsi Lu · Jingjing Liu · Wei-Ying Ma · Yanyan Lan -
2023 Poster: Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation »
Yuxuan Song · Jingjing Gong · Minkai Xu · Ziyao Cao · Yanyan Lan · Stefano Ermon · Hao Zhou · Wei-Ying Ma -
2023 Poster: DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation »
Shentong Mo · Enze Xie · Ruihang Chu · Lanqing Hong · Matthias Niessner · Zhenguo Li -
2022 : Sample-Specific Contextualized Graphical Models Using Clinical and Molecular Data Reveal Transcriptional Network Heterogeneity Across 7000 Tumors »
Caleb Ellington · Ben Lengerich · Thomas Watkins · Jiekun Yang · Manolis Kellis · Eric Xing -
2022 Spotlight: When Does Group Invariant Learning Survive Spurious Correlations? »
Yimeng Chen · Ruibin Xiong · Zhi-Ming Ma · Yanyan Lan -
2022 Poster: When Does Group Invariant Learning Survive Spurious Correlations? »
Yimeng Chen · Ruibin Xiong · Zhi-Ming Ma · Yanyan Lan -
2022 Poster: Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing »
Shentong Mo · Yapeng Tian -
2022 Poster: A Closer Look at Weakly-Supervised Audio-Visual Source Localization »
Shentong Mo · Pedro Morgado -
2021 Workshop: Math AI for Education (MATHAI4ED): Bridging the Gap Between Research and Smart Education »
Pan Lu · Yuhuai Wu · Sean Welleck · Xiaodan Liang · Eric Xing · James McClelland -
2021 Workshop: 2nd Workshop on Self-Supervised Learning: Theory and Practice »
Pengtao Xie · Ishan Misra · Pulkit Agrawal · Abdelrahman Mohamed · Shentong Mo · Youwei Liang · Jeannette Bohg · Kristina N Toutanova -
2021 : Poster Session 2 (gather.town) »
Wenjie Li · Akhilesh Soni · Jinwuk Seok · Jianhao Ma · Jeffery Kline · Mathieu Tuli · Miaolan Xie · Robert Gower · Quanqi Hu · Matteo Cacciola · Yuanlu Bai · Boyue Li · Wenhao Zhan · Shentong Mo · Junhyung Lyle Kim · Sajad Fathi Hafshejani · Chris Junchi Li · Zhishuai Guo · Harshvardhan Harshvardhan · Neha Wadia · Tatjana Chavdarova · Difan Zou · Zixiang Chen · Aman Gupta · Jacques Chen · Betty Shea · Benoit Dherin · Aleksandr Beznosikov -
2021 Poster: Uncertainty Calibration for Ensemble-Based Debiasing Methods »
Ruibin Xiong · Yimeng Chen · Liang Pang · Xueqi Cheng · Zhi-Ming Ma · Yanyan Lan -
2021 Poster: Multi-task Learning of Order-Consistent Causal Graphs »
Xinshi Chen · Haoran Sun · Caleb Ellington · Eric Xing · Le Song -
2020 : Panel Discussion & Closing »
Yejin Choi · Alexei Efros · Chelsea Finn · Kristen Grauman · Quoc V Le · Yann LeCun · Ruslan Salakhutdinov · Eric Xing -
2020 Workshop: Self-Supervised Learning -- Theory and Practice »
Pengtao Xie · Shanghang Zhang · Pulkit Agrawal · Ishan Misra · Cynthia Rudin · Abdelrahman Mohamed · Wenzhen Yuan · Barret Zoph · Laurens van der Maaten · Xingyi Yang · Eric Xing -
2020 Poster: Regularizing Black-box Models for Improved Interpretability »
Gregory Plumb · Maruan Al-Shedivat · Ángel Alexander Cabrera · Adam Perer · Eric Xing · Ameet Talwalkar -
2020 Poster: AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning »
Hao Zhang · Yuan Li · Zhijie Deng · Xiaodan Liang · Lawrence Carin · Eric Xing -
2020 Poster: Improving GAN Training with Probability Ratio Clipping and Sample Reweighting »
Yue Wu · Pan Zhou · Andrew Wilson · Eric Xing · Zhiting Hu -
2019 : Poster Presentations »
Rahul Mehta · Andrew Lampinen · Binghong Chen · Sergio Pascual-Diaz · Jordi Grau-Moya · Aldo Faisal · Jonathan Tompson · Yiren Lu · Khimya Khetarpal · Martin Klissarov · Pierre-Luc Bacon · Doina Precup · Thanard Kurutach · Aviv Tamar · Pieter Abbeel · Jinke He · Maximilian Igl · Shimon Whiteson · Wendelin Boehmer · Raphaël Marinier · Olivier Pietquin · Karol Hausman · Sergey Levine · Chelsea Finn · Tianhe Yu · Lisa Lee · Benjamin Eysenbach · Emilio Parisotto · Eric Xing · Ruslan Salakhutdinov · Hongyu Ren · Anima Anandkumar · Deepak Pathak · Christopher Lu · Trevor Darrell · Alexei Efros · Phillip Isola · Feng Liu · Bo Han · Gang Niu · Masashi Sugiyama · Saurabh Kumar · Janith Petangoda · Johan Ferret · James McClelland · Kara Liu · Animesh Garg · Robert Lange -
2019 Workshop: Learning with Rich Experience: Integration of Learning Paradigms »
Zhiting Hu · Andrew Wilson · Chelsea Finn · Lisa Lee · Taylor Berg-Kirkpatrick · Ruslan Salakhutdinov · Eric Xing -
2019 Poster: Learning Robust Global Representations by Penalizing Local Predictive Power »
Haohan Wang · Songwei Ge · Zachary Lipton · Eric Xing -
2019 Poster: Learning Data Manipulation for Augmentation and Weighting »
Zhiting Hu · Bowen Tan · Russ Salakhutdinov · Tom Mitchell · Eric Xing -
2019 Poster: Learning Sample-Specific Models with Low-Rank Personalized Regression »
Ben Lengerich · Bryon Aragam · Eric Xing -
2018 Poster: The Sample Complexity of Semi-Supervised Learning with Nonparametric Mixture Models »
Chen Dan · Liu Leqi · Bryon Aragam · Pradeep Ravikumar · Eric Xing -
2018 Poster: Symbolic Graph Reasoning Meets Convolutions »
Xiaodan Liang · Zhiting Hu · Hao Zhang · Liang Lin · Eric Xing -
2018 Poster: DAGs with NO TEARS: Continuous Optimization for Structure Learning »
Xun Zheng · Bryon Aragam · Pradeep Ravikumar · Eric Xing -
2018 Spotlight: DAGs with NO TEARS: Continuous Optimization for Structure Learning »
Xun Zheng · Bryon Aragam · Pradeep Ravikumar · Eric Xing -
2018 Poster: Learning Pipelines with Limited Data and Domain Knowledge: A Study in Parsing Physics Problems »
Mrinmaya Sachan · Kumar Avinava Dubey · Tom Mitchell · Dan Roth · Eric Xing -
2018 Poster: Deep Generative Models with Learnable Knowledge Constraints »
Zhiting Hu · Zichao Yang · Russ Salakhutdinov · LIANHUI Qin · Xiaodan Liang · Haoye Dong · Eric Xing -
2018 Poster: Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation »
Yuan Li · Xiaodan Liang · Zhiting Hu · Eric Xing -
2018 Poster: Neural Architecture Search with Bayesian Optimisation and Optimal Transport »
Kirthevasan Kandasamy · Willie Neiswanger · Jeff Schneider · Barnabas Poczos · Eric Xing -
2018 Spotlight: Neural Architecture Search with Bayesian Optimisation and Optimal Transport »
Kirthevasan Kandasamy · Willie Neiswanger · Jeff Schneider · Barnabas Poczos · Eric Xing -
2018 Poster: Unsupervised Text Style Transfer using Language Models as Discriminators »
Zichao Yang · Zhiting Hu · Chris Dyer · Eric Xing · Taylor Berg-Kirkpatrick -
2017 Poster: Structured Generative Adversarial Networks »
Zhijie Deng · Hao Zhang · Xiaodan Liang · Luona Yang · Shizhen Xu · Jun Zhu · Eric Xing -
2016 : Eric Xing »
Eric Xing -
2016 Poster: Variance Reduction in Stochastic Gradient Langevin Dynamics »
Kumar Avinava Dubey · Sashank J. Reddi · Sinead Williamson · Barnabas Poczos · Alexander Smola · Eric Xing -
2016 Poster: Learning HMMs with Nonparametric Emissions via Spectral Decompositions of Continuous Matrices »
Kirthevasan Kandasamy · Maruan Al-Shedivat · Eric Xing -
2016 Poster: Stochastic Variational Deep Kernel Learning »
Andrew Wilson · Zhiting Hu · Russ Salakhutdinov · Eric Xing -
2015 Workshop: Nonparametric Methods for Large Scale Representation Learning »
Andrew G Wilson · Alexander Smola · Eric Xing -
2015 Poster: The Human Kernel »
Andrew Wilson · Christoph Dann · Chris Lucas · Eric Xing -
2015 Spotlight: The Human Kernel »
Andrew Wilson · Christoph Dann · Chris Lucas · Eric Xing -
2014 Workshop: Modern Nonparametrics 3: Automating the Learning Pipeline »
Eric Xing · Mladen Kolar · Arthur Gretton · Samory Kpotufe · Han Liu · Zoltán Szabó · Alan Yuille · Andrew G Wilson · Ryan Tibshirani · Sasha Rakhlin · Damian Kozbur · Bharath Sriperumbudur · David Lopez-Paz · Kirthevasan Kandasamy · Francesco Orabona · Andreas Damianou · Wacha Bounliphone · Yanshuai Cao · Arijit Das · Yingzhen Yang · Giulia DeSalvo · Dmitry Storcheus · Roberto Valerio -
2014 Workshop: Modern Machine Learning and Natural Language Processing »
Ankur P Parikh · Avneesh Saluja · Chris Dyer · Eric Xing -
2014 Poster: On Model Parallelization and Scheduling Strategies for Distributed Machine Learning »
Seunghak Lee · Jin Kyu Kim · Xun Zheng · Qirong Ho · Garth Gibson · Eric Xing -
2014 Poster: Dependent nonparametric trees for dynamic hierarchical clustering »
Kumar Avinava Dubey · Qirong Ho · Sinead Williamson · Eric Xing -
2013 Poster: More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server »
Qirong Ho · James Cipar · Henggang Cui · Seunghak Lee · Jin Kyu Kim · Phillip B. Gibbons · Garth Gibson · Greg Ganger · Eric Xing -
2013 Oral: More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server »
Qirong Ho · James Cipar · Henggang Cui · Seunghak Lee · Jin Kyu Kim · Phillip B. Gibbons · Garth Gibson · Greg Ganger · Eric Xing -
2013 Poster: Variance Reduction for Stochastic Gradient Optimization »
Chong Wang · Xi Chen · Alexander Smola · Eric Xing -
2013 Poster: Restricting exchangeable nonparametric distributions »
Sinead Williamson · Steven MacEachern · Eric Xing -
2013 Spotlight: Restricting exchangeable nonparametric distributions »
Sinead Williamson · Steven MacEachern · Eric Xing -
2013 Poster: A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks »
Junming Yin · Qirong Ho · Eric Xing -
2012 Workshop: Spectral Algorithms for Latent Variable Models »
Ankur P Parikh · Le Song · Eric Xing -
2012 Poster: Monte Carlo Methods for Maximum Margin Supervised Topic Models »
Qixia Jiang · Jun Zhu · Maosong Sun · Eric Xing -
2012 Poster: On Triangular versus Edge Representations --- Towards Scalable Modeling of Networks »
Qirong Ho · Junming Yin · Eric Xing -
2012 Poster: Symmetric Correspondence Topic Models for Multilingual Text Analysis »
Kosuke Fukumasu · Koji Eguchi · Eric Xing -
2012 Spotlight: Symmetric Correspondence Topic Models for Multilingual Text Analysis »
Kosuke Fukumasu · Koji Eguchi · Eric Xing -
2011 Poster: Infinite Latent SVM for Classification and Multi-task Learning »
Jun Zhu · Ning Chen · Eric Xing -
2011 Poster: Kernel Embeddings of Latent Tree Graphical Models »
Le Song · Ankur P Parikh · Eric Xing -
2011 Poster: Large-Scale Category Structure Aware Image Categorization »
Bin Zhao · Li Fei-Fei · Eric Xing -
2010 Poster: Large Margin Learning of Upstream Scene Understanding Models »
Jun Zhu · Li-Jia Li · Li Fei-Fei · Eric Xing -
2010 Poster: Predictive Subspace Learning for Multi-view Data: a Large Margin Approach »
Ning Chen · Jun Zhu · Eric Xing -
2010 Poster: Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification »
Li-Jia Li · Hao Su · Eric Xing · Li Fei-Fei -
2010 Poster: Adaptive Multi-Task Lasso: with Application to eQTL Detection »
Seunghak Lee · Jun Zhu · Eric Xing -
2009 Poster: Heterogeneous multitask learning with joint sparsity constraints »
Xiaolin Yang · Seyoung Kim · Eric Xing -
2009 Poster: Time-Varying Dynamic Bayesian Networks »
Le Song · Mladen Kolar · Eric Xing -
2009 Spotlight: Time-Varying Dynamic Bayesian Networks »
Le Song · Mladen Kolar · Eric Xing -
2009 Poster: Sparsistent Learning of Varying-coefficient Models with Structural Changes »
Mladen Kolar · Le Song · Eric Xing -
2009 Spotlight: Sparsistent Learning of Varying-coefficient Models with Structural Changes »
Mladen Kolar · Le Song · Eric Xing -
2008 Workshop: Analyzing Graphs: Theory and Applications »
Edo M Airoldi · David Blei · Jake M Hofman · Tony Jebara · Eric Xing -
2008 Poster: Mixed Membership Stochastic Blockmodels »
Edo M Airoldi · David Blei · Stephen E Fienberg · Eric Xing -
2008 Spotlight: Mixed Membership Stochastic Blockmodels »
Edo M Airoldi · David Blei · Stephen E Fienberg · Eric Xing -
2008 Poster: Partially Observed Maximum Entropy Discrimination Markov Networks »
Jun Zhu · Eric Xing · Bo Zhang -
2007 Workshop: Statistical Network Models »
Kevin Murphy · Lise Getoor · Eric Xing · Raphael Gottardo -
2007 Poster: HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation »
Bing Zhao · Eric Xing -
2006 Poster: A Hidden Markov Dirichlet Process Model for Genetic Recombination in Open Ancestral Space »
KyungAh Sohn · Eric Xing -
2006 Talk: A Hidden Markov Dirichlet Process Model for Genetic Recombination in Open Ancestral Space »
KyungAh Sohn · Eric Xing