Timezone: »
Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main reason is that stochastic attention often introduces optimization issues or requires significant model changes. In this paper, we propose a scalable stochastic version of attention that is easy to implement and optimize. We construct simplex-constrained attention distributions by normalizing reparameterizable distributions, making the training process differentiable. We learn their parameters in a Bayesian framework where a data-dependent prior is introduced for regularization. We apply the proposed stochastic attention modules to various attention-based models, with applications to graph node classification, visual question answering, image captioning, machine translation, and language understanding. Our experiments show the proposed method brings consistent improvements over the corresponding baselines.
Author Information
Xinjie Fan (UT Austin)
Shujian Zhang (UT Austin)
Bo Chen (Xidian University)
Mingyuan Zhou (University of Texas at Austin)
More from the Same Authors
-
2021 Poster: Exploiting Chain Rule and Bayes' Theorem to Compare Probability Distributions »
Huangjie Zheng · Mingyuan Zhou -
2021 Poster: Alignment Attention by Matching Key and Query Distributions »
Shujian Zhang · Xinjie Fan · Huangjie Zheng · Korawat Tanwisuth · Mingyuan Zhou -
2021 Poster: Probabilistic Margins for Instance Reweighting in Adversarial Training »
qizhou wang · Feng Liu · Bo Han · Tongliang Liu · Chen Gong · Gang Niu · Mingyuan Zhou · Masashi Sugiyama -
2021 Poster: Convex Polytope Trees »
Mohammadreza Armandpour · Ali Sadeghian · Mingyuan Zhou -
2021 Poster: TopicNet: Semantic Graph-Guided Topic Discovery »
Zhibin Duan · Yi.shi Xu · Bo Chen · dongsheng wang · Chaojie Wang · Mingyuan Zhou -
2021 Poster: A Prototype-Oriented Framework for Unsupervised Domain Adaptation »
Korawat Tanwisuth · Xinjie Fan · Huangjie Zheng · Shujian Zhang · Hao Zhang · Bo Chen · Mingyuan Zhou -
2021 Poster: CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator »
Alek Dimitriev · Mingyuan Zhou -
2020 Poster: Bidirectional Convolutional Poisson Gamma Dynamical Systems »
wenchao chen · Chaojie Wang · Bo Chen · Yicheng Liu · Hao Zhang · Mingyuan Zhou -
2020 Poster: Implicit Distributional Reinforcement Learning »
Yuguang Yue · Zhendong Wang · Mingyuan Zhou -
2020 Poster: Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network »
Chaojie Wang · Hao Zhang · Bo Chen · Dongsheng Wang · Zhengjue Wang · Mingyuan Zhou -
2019 Poster: Variational Graph Recurrent Neural Networks »
Ehsan Hajiramezanali · Arman Hasanzadeh · Krishna Narayanan · Nick Duffield · Mingyuan Zhou · Xiaoning Qian -
2019 Poster: Semi-Implicit Graph Variational Auto-Encoders »
Arman Hasanzadeh · Ehsan Hajiramezanali · Krishna Narayanan · Nick Duffield · Mingyuan Zhou · Xiaoning Qian -
2019 Poster: Poisson-Randomized Gamma Dynamical Systems »
Aaron Schein · Scott Linderman · Mingyuan Zhou · David Blei · Hanna Wallach -
2018 Poster: Nonparametric Bayesian Lomax delegate racing for survival analysis with competing risks »
Quan Zhang · Mingyuan Zhou -
2018 Poster: Deep Poisson gamma dynamical systems »
Dandan Guo · Bo Chen · Hao Zhang · Mingyuan Zhou -
2018 Poster: Dirichlet belief networks for topic structure learning »
He Zhao · Lan Du · Wray Buntine · Mingyuan Zhou -
2018 Poster: Parsimonious Bayesian deep networks »
Mingyuan Zhou -
2018 Poster: Masking: A New Perspective of Noisy Supervision »
Bo Han · Jiangchao Yao · Gang Niu · Mingyuan Zhou · Ivor Tsang · Ya Zhang · Masashi Sugiyama -
2018 Poster: Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data »
Ehsan Hajiramezanali · Siamak Zamani Dadaneh · Alireza Karbalayghareh · Mingyuan Zhou · Xiaoning Qian -
2016 Poster: Poisson-Gamma dynamical systems »
Aaron Schein · Hanna Wallach · Mingyuan Zhou -
2016 Oral: Poisson-Gamma dynamical systems »
Aaron Schein · Hanna Wallach · Mingyuan Zhou -
2015 Poster: The Poisson Gamma Belief Network »
Mingyuan Zhou · Yulai Cong · Bo Chen -
2014 Poster: Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling »
Mingyuan Zhou -
2012 Poster: Augment-and-Conquer Negative Binomial Processes »
Mingyuan Zhou · Lawrence Carin -
2012 Spotlight: Augment-and-Conquer Negative Binomial Processes »
Mingyuan Zhou · Lawrence Carin -
2011 Poster: On the Analysis of Multi-Channel Neural Spike Data »
Bo Chen · David Carlson · Lawrence Carin -
2009 Poster: Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations »
Mingyuan Zhou · Haojun Chen · John Paisley · Lu Ren · Guillermo Sapiro · Lawrence Carin -
2009 Oral: Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations »
Mingyuan Zhou · Haojun Chen · John Paisley · Lu Ren · Guillermo Sapiro · Larry Carin