Timezone: »
Structure-Inducing Pre-training
Matthew McDermott · Brendan Yap · Peter Szolovits · Marinka Zitnik
Event URL: https://openreview.net/forum?id=gL378GXc1kA »
Language model pre-training and derived methods are incredibly impactful in machine learning.However, there remains considerable uncertainty on exactly why pre-training helps improve performance for fine-tuning tasks. This is especially true when attempting to adapt language-model pre-training to domains outside of natural language.Here, we analyze this problem by exploring how existing pre-training methods impose relational structure in their induced per-sample latent spaces---i.e., what constraints do pre-training methods impose on the distance or geometry between the pre-trained embeddings of two samples $\boldsymbol{x}_i$ and $\boldsymbol{x}_j$.Through a comprehensive review of existing pre-training methods, we find that this question remains open. This is true despite theoretical analyses demonstrating the importance of understanding this form of induced structure.Based on this review, we introduce a descriptive framework for pre-training that allows for a granular, comprehensive understanding of how relational structure can be induced. We present a theoretical analysis of this framework from first principles and establish a connection between the relational inductive bias of pre-training and fine-tuning performance. We also show how to use the framework to define new pre-training methods.We build upon these findings with empirical studies on benchmarks spanning 3 data modalities and ten fine-tuning tasks. These experiments validate our theoretical analyses, inform the design of novel pre-training methods, and establish consistent improvements over a compelling suite of baseline methods.
Language model pre-training and derived methods are incredibly impactful in machine learning.However, there remains considerable uncertainty on exactly why pre-training helps improve performance for fine-tuning tasks. This is especially true when attempting to adapt language-model pre-training to domains outside of natural language.Here, we analyze this problem by exploring how existing pre-training methods impose relational structure in their induced per-sample latent spaces---i.e., what constraints do pre-training methods impose on the distance or geometry between the pre-trained embeddings of two samples $\boldsymbol{x}_i$ and $\boldsymbol{x}_j$.Through a comprehensive review of existing pre-training methods, we find that this question remains open. This is true despite theoretical analyses demonstrating the importance of understanding this form of induced structure.Based on this review, we introduce a descriptive framework for pre-training that allows for a granular, comprehensive understanding of how relational structure can be induced. We present a theoretical analysis of this framework from first principles and establish a connection between the relational inductive bias of pre-training and fine-tuning performance. We also show how to use the framework to define new pre-training methods.We build upon these findings with empirical studies on benchmarks spanning 3 data modalities and ten fine-tuning tasks. These experiments validate our theoretical analyses, inform the design of novel pre-training methods, and establish consistent improvements over a compelling suite of baseline methods.
Author Information
Matthew McDermott (Test)
Brendan Yap (Massachusetts Institute of Technology)
Peter Szolovits (MIT)
Marinka Zitnik (Harvard University)
More from the Same Authors
-
2023 Workshop: AI for Science: from Theory to Practice »
Yuanqi Du · Max Welling · Yoshua Bengio · Marinka Zitnik · Carla Gomes · Jure Leskovec · Maria Brbic · Wenhao Gao · Kexin Huang · Ziming Liu · RocĂo Mercado · Miles Cranmer · Shengchao Liu · Lijing Wang -
2022 : Keynote »
Marinka Zitnik -
2022 Workshop: New Frontiers in Graph Learning »
Jiaxuan You · Marinka Zitnik · Rex Ying · Yizhou Sun · Hanjun Dai · Stefanie Jegelka -
2022 Workshop: AI for Science: Progress and Promises »
Yi Ding · Yuanqi Du · Tianfan Fu · Hanchen Wang · Anima Anandkumar · Yoshua Bengio · Anthony Gitter · Carla Gomes · Aviv Regev · Max Welling · Marinka Zitnik -
2022 Poster: OpenXAI: Towards a Transparent Evaluation of Model Explanations »
Chirag Agarwal · Satyapriya Krishna · Eshika Saxena · Martin Pawelczyk · Nari Johnson · Isha Puri · Marinka Zitnik · Himabindu Lakkaraju -
2022 Poster: Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency »
Xiang Zhang · Ziyuan Zhao · Theodoros Tsiligkaridis · Marinka Zitnik -
2021 : Spotlight talks: new datasets and research finalists »
Roxana Daneshjou · Sharmita Dey · Sabri Boughorbel · Matthew McDermott · Daniel Gedon -
2021 Workshop: AI for Science: Mind the Gaps »
Payal Chandak · Yuanqi Du · Tianfan Fu · Wenhao Gao · Kexin Huang · Shengchao Liu · Ziming Liu · Gabriel Spadon · Max Tegmark · Hanchen Wang · Adrian Weller · Max Welling · Marinka Zitnik -
2020 Poster: Open Graph Benchmark: Datasets for Machine Learning on Graphs »
Weihua Hu · Matthias Fey · Marinka Zitnik · Yuxiao Dong · Hongyu Ren · Bowen Liu · Michele Catasta · Jure Leskovec -
2020 Poster: Graph Meta Learning via Local Subgraphs »
Kexin Huang · Marinka Zitnik -
2020 Spotlight: Open Graph Benchmark: Datasets for Machine Learning on Graphs »
Weihua Hu · Matthias Fey · Marinka Zitnik · Yuxiao Dong · Hongyu Ren · Bowen Liu · Michele Catasta · Jure Leskovec -
2020 Poster: GNNGuard: Defending Graph Neural Networks against Adversarial Attacks »
Xiang Zhang · Marinka Zitnik -
2020 Poster: Subgraph Neural Networks »
Emily Alsentzer · Samuel Finlayson · Michelle Li · Marinka Zitnik -
2020 Poster: Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation »
Aaron Sonabend · Junwei Lu · Leo Anthony Celi · Tianxi Cai · Peter Szolovits -
2020 Demonstration: MolDesigner: Interactive Design of Efficacious Drugs with Deep Learning »
Kexin Huang · Tianfan Fu · Dawood Khan · Ali Abid · Ali Abdalla · Abubaker Abid · Lucas Glass · Marinka Zitnik · Cao Xiao · Jimeng Sun