Timezone: »
Poster
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost
Sungjun Cho · Seonwoo Min · Jinwoo Kim · Moontae Lee · Honglak Lee · Seunghoon Hong
To overcome the quadratic cost of self-attention, recent works have proposed various sparse attention modules, most of which fall under one of two groups: 1) sparse attention under a hand-crafted patterns and 2) full attention followed by a sparse variant of softmax such as $\alpha$-entmax. Unfortunately, the first group lacks adaptability to data while the second still requires quadratic cost in training. In this work, we propose SBM-Transformer, a model that resolves both problems by endowing each attention head with a mixed-membership Stochastic Block Model (SBM). Then, each attention head data-adaptively samples a bipartite graph, the adjacency of which is used as an attention mask for each input. During backpropagation, a straight-through estimator is used to flow gradients beyond the discrete sampling step and adjust the probabilities of sampled edges based on the predictive loss. The forward and backward cost are thus linear to the number of edges, which each attention head can also choose flexibly based on the input. By assessing the distribution of graphs, we theoretically show that SBM-Transformer is a universal approximator for arbitrary sequence-to-sequence functions in expectation. Empirical evaluations under the LRA and GLUE benchmarks demonstrate that our model outperforms previous efficient variants as well as the original Transformer with full attention. Our implementation can be found in https://github.com/sc782/SBM-Transformer.
Author Information
Sungjun Cho (LG AI Research)
Seonwoo Min (Seoul National University)
Jinwoo Kim (KAIST)
Moontae Lee (University of Illinois at Chicago)
Honglak Lee (LG AI Research / U. Michigan)
Seunghoon Hong (Korea Advanced Institute of Science and Technology)
More from the Same Authors
-
2021 : Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks »
Yijie Guo · Qiucheng Wu · Honglak Lee -
2021 : Fast Inference and Transfer of Compositional Task for Few-shot Task Generalization »
Sungryull Sohn · Hyunjae Woo · Jongwook Choi · Izzeddin Gur · Aleksandra Faust · Honglak Lee -
2021 : Learning Parameterized Task Structure for Generalization to Unseen Entities »
Anthony Liu · Sungryull Sohn · Honglak Lee -
2021 : SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning »
Jongjin Park · Younggyo Seo · Jinwoo Shin · Honglak Lee · Pieter Abbeel · Kimin Lee -
2021 : Learning compositional tasks from language instructions »
Lajanugen Logeswaran · Wilka Carvalho · Honglak Lee -
2022 : Allele-conditional attention mechanism for HLA-peptide complex binding affinity prediction »
Rodrigo Hormazabal · Doyeong Hwang · Kiyoung Kim · Sehui Han · Kyunghoon Bae · Honglak Lee -
2022 : Dynamics-Augmented Decision Transformer for Offline Dynamics Generalization »
Changyeon Kim · Junsu Kim · Younggyo Seo · Kimin Lee · Honglak Lee · Jinwoo Shin -
2022 : Learning Exploration Policies with View-based Intrinsic Rewards »
Yijie Guo · Yao Fu · Run Peng · Honglak Lee -
2023 Poster: CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation »
Sihan Xu · Ziqiao Ma · Yidong Huang · Honglak Lee · Joyce Chai -
2023 Poster: SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations »
Youngsoo Jang · Geon-Hyeong Kim · Jongmin Lee · Sungryull Sohn · Byoungjip Kim · Honglak Lee · Moontae Lee -
2023 Poster: Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance »
Jinwoo Kim · Dat Nguyen · Ayhan Suleymanzade · Hyeokjun An · Seunghoon Hong -
2023 Poster: Discovering Representations for Transfer with Successor Features and the Deep Option Keyboard »
Wilka Carvalho Carvalho · Andre Saraiva · Angelos Filos · Andrew Lampinen · Loic Matthey · Richard L Lewis · Honglak Lee · Satinder Singh · Danilo Jimenez Rezende · Daniel Zoran -
2023 Poster: Guide Your Agent with Adaptive Multimodal Rewards »
Changyeon Kim · Younggyo Seo · Hao Liu · Lisa Lee · Jinwoo Shin · Honglak Lee · Kimin Lee -
2023 Poster: Projection Regret: Reducing Background Bias for Novelty Detection via Diffusion Models »
Sungik Choi · Hankook Lee · Honglak Lee · Moontae Lee -
2022 : ReSPack: A Large-Scale Rectilinear Steiner Tree Packing Data Generator and Benchmark »
Kanghoon Lee · Youngjoon Park · Han-Seul Jeong · Deunsol Yoon · Sunghoon Hong · Sungryull Sohn · Minu Kim · Hanbum Ko · Moontae Lee · Honglak Lee · Kyunghoon Kim · Euihyuk Kim · Seonggeon Cho · Jaesang Min · Woohyung Lim -
2022 Poster: Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching »
Byoungjip Kim · Sungik Choi · Dasol Hwang · Moontae Lee · Honglak Lee -
2022 Poster: Pure Transformers are Powerful Graph Learners »
Jinwoo Kim · Dat Nguyen · Seonwoo Min · Sungjun Cho · Moontae Lee · Honglak Lee · Seunghoon Hong -
2022 Poster: OpenSRH: optimizing brain tumor surgery using intraoperative stimulated Raman histology »
Cheng Jiang · Asadur Chowdury · Xinhai Hou · Akhil Kondepudi · Christian Freudiger · Kyle Conway · Sandra Camelo-Piragua · Daniel Orringer · Honglak Lee · Todd Hollon -
2022 Poster: UniCLIP: Unified Framework for Contrastive Language-Image Pre-training »
Janghyeon Lee · Jongsuk Kim · Hyounguk Shon · Bumsoo Kim · Seung Hwan Kim · Honglak Lee · Junmo Kim -
2022 Poster: CEDe: A collection of expert-curated datasets with atom-level entity annotations for Optical Chemical Structure Recognition »
Rodrigo Hormazabal · Changyoung Park · Soonyoung Lee · Sehui Han · Yeonsik Jo · Jaewan Lee · Ahra Jo · Seung Hwan Kim · Jaegul Choo · Moontae Lee · Honglak Lee -
2022 Expo Talk Panel: Towards learning agents for solving complex real-world tasks »
Honglak Lee -
2021 Poster: Why Do Better Loss Functions Lead to Less Transferable Features? »
Simon Kornblith · Ting Chen · Honglak Lee · Mohammad Norouzi -
2021 Poster: Improving Transferability of Representations via Augmentation-Aware Self-Supervision »
Hankook Lee · Kibok Lee · Kimin Lee · Honglak Lee · Jinwoo Shin -
2021 Poster: Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning »
Christopher Hoang · Sungryull Sohn · Jongwook Choi · Wilka Carvalho · Honglak Lee -
2021 Poster: Transformers Generalize DeepSets and Can be Extended to Graphs & Hypergraphs »
Jinwoo Kim · Saeyoon Oh · Seunghoon Hong -
2021 Poster: Environment Generation for Zero-Shot Compositional Reinforcement Learning »
Izzeddin Gur · Natasha Jaques · Yingjie Miao · Jongwook Choi · Manoj Tiwari · Honglak Lee · Aleksandra Faust -
2020 Poster: Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards »
Yijie Guo · Jongwook Choi · Marcin Moczulski · Shengyu Feng · Samy Bengio · Mohammad Norouzi · Honglak Lee -
2020 Poster: Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning »
Guangxiang Zhu · Minghao Zhang · Honglak Lee · Chongjie Zhang -
2019 Poster: High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks »
Ruben Villegas · Arkanath Pathak · Harini Kannan · Dumitru Erhan · Quoc V Le · Honglak Lee -
2018 Poster: A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks »
Kimin Lee · Kibok Lee · Honglak Lee · Jinwoo Shin -
2018 Spotlight: A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks »
Kimin Lee · Kibok Lee · Honglak Lee · Jinwoo Shin -
2018 Poster: Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies »
Sungryull Sohn · Junhyuk Oh · Honglak Lee -
2018 Poster: Learning Hierarchical Semantic Image Manipulation through Structured Representations »
Seunghoon Hong · Xinchen Yan · Thomas Huang · Honglak Lee -
2017 : Invited Talk 5 »
Honglak Lee -
2017 Workshop: Learning Disentangled Features: from Perception to Control »
Emily Denton · Siddharth Narayanaswamy · Tejas Kulkarni · Honglak Lee · Diane Bouchacourt · Josh Tenenbaum · David Pfau -
2017 Poster: Deep Recurrent Neural Network-Based Identification of Precursor microRNAs »
Seunghyun Park · Seonwoo Min · Hyun-Soo Choi · Sungroh Yoon -
2017 Poster: Value Prediction Network »
Junhyuk Oh · Satinder Singh · Honglak Lee -
2016 Poster: Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision »
Xinchen Yan · Jimei Yang · Ersin Yumer · Yijie Guo · Honglak Lee -
2016 Poster: Learning What and Where to Draw »
Scott E Reed · Zeynep Akata · Santosh Mohan · Samuel Tenka · Bernt Schiele · Honglak Lee -
2016 Oral: Learning What and Where to Draw »
Scott E Reed · Zeynep Akata · Santosh Mohan · Samuel Tenka · Bernt Schiele · Honglak Lee -
2016 Poster: Neural Universal Discrete Denoiser »
Taesup Moon · Seonwoo Min · Byunghan Lee · Sungroh Yoon -
2015 : Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning »
Honglak Lee -
2015 Symposium: Deep Learning Symposium »
Yoshua Bengio · Marc'Aurelio Ranzato · Honglak Lee · Max Welling · Andrew Y Ng -
2015 Poster: Deep Visual Analogy-Making »
Scott E Reed · Yi Zhang · Yuting Zhang · Honglak Lee -
2015 Poster: Action-Conditional Video Prediction using Deep Networks in Atari Games »
Junhyuk Oh · Xiaoxiao Guo · Honglak Lee · Richard L Lewis · Satinder Singh -
2015 Spotlight: Action-Conditional Video Prediction using Deep Networks in Atari Games »
Junhyuk Oh · Xiaoxiao Guo · Honglak Lee · Richard L Lewis · Satinder Singh -
2015 Oral: Deep Visual Analogy-Making »
Scott E Reed · Yi Zhang · Yuting Zhang · Honglak Lee -
2015 Poster: Learning Structured Output Representation using Deep Conditional Generative Models »
Kihyuk Sohn · Honglak Lee · Xinchen Yan -
2015 Poster: Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis »
Jimei Yang · Scott E Reed · Ming-Hsuan Yang · Honglak Lee -
2014 Workshop: Representation and Learning Methods for Complex Outputs »
Richard Zemel · Dale Schuurmans · Kilian Q Weinberger · Yuhong Guo · Jia Deng · Francesco Dinuzzo · Hal Daumé III · Honglak Lee · Noah A Smith · Richard Sutton · Jiaqian YU · Vitaly Kuznetsov · Luke Vilnis · Hanchen Xiong · Calvin Murdock · Thomas Unterthiner · Jean-Francis Roy · Martin Renqiang Min · Hichem SAHBI · Fabio Massimo Zanzotto -
2014 Poster: Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning »
Xiaoxiao Guo · Satinder Singh · Honglak Lee · Richard L Lewis · Xiaoshi Wang -
2014 Poster: Improved Multimodal Deep Learning with Variation of Information »
Kihyuk Sohn · Wenling Shang · Honglak Lee -
2013 Poster: Robust Image Denoising with Multi-Column Deep Neural Networks »
Forest Agostinelli · Michael R Anderson · Honglak Lee -
2012 Poster: Learning to Align from Scratch »
Gary B Huang · Marwan A Mattar · Honglak Lee · Erik Learned-Miller -
2010 Workshop: Deep Learning and Unsupervised Feature Learning »
Honglak Lee · Marc'Aurelio Ranzato · Yoshua Bengio · Geoffrey E Hinton · Yann LeCun · Andrew Y Ng -
2009 Poster: Unsupervised feature learning for audio classification using convolutional deep belief networks »
Honglak Lee · Peter Pham · Yan Largman · Andrew Y Ng -
2007 Poster: Sparse deep belief net model for visual area V2 »
Honglak Lee · Ekanadham Chaitanya · Andrew Y Ng -
2006 Poster: Efficient sparse coding algorithms, end-stopping and nCRF surround suppression »
Honglak Lee · Alexis Battle · Raina Rajat · Andrew Y Ng