Timezone: »
Selective rationalization explains the prediction of complex neural networks by finding a small subset of the input that is sufficient to predict the neural model output. The selection mechanism is commonly integrated into the model itself by specifying a two-component cascaded system consisting of a rationale generator, which makes a binary selection of the input features (which is the rationale), and a predictor, which predicts the output based only on the selected features. The components are trained jointly to optimize prediction performance. In this paper, we reveal a major problem with such cooperative rationalization paradigm --- model interlocking. Inter-locking arises when the predictor overfits to the features selected by the generator thus reinforcing the generator's selection even if the selected rationales are sub-optimal. The fundamental cause of the interlocking problem is that the rationalization objective to be minimized is concave with respect to the generator’s selection policy. We propose a new rationalization framework, called A2R, which introduces a third component into the architecture, a predictor driven by soft attention as opposed to selection. The generator now realizes both soft and hard attention over the features and these are fed into the two different predictors. While the generator still seeks to support the original predictor performance, it also minimizes a gap between the two predictors. As we will show theoretically, since the attention-based predictor exhibits a better convexity property, A2R can overcome the concavity barrier. Our experiments on two synthetic benchmarks and two real datasets demonstrate that A2R can significantly alleviate the interlock problem and find explanations that better align with human judgments.
Author Information
Mo Yu (Johns Hopkins University)
Yang Zhang (MIT-IBM Watson AI Lab)
Shiyu Chang (UC Santa Barbara)
Tommi Jaakkola (MIT)
Tommi Jaakkola is a professor of Electrical Engineering and Computer Science at MIT. He received an M.Sc. degree in theoretical physics from Helsinki University of Technology, and Ph.D. from MIT in computational neuroscience. Following a Sloan postdoctoral fellowship in computational molecular biology, he joined the MIT faculty in 1998. His research interests include statistical inference, graphical models, and large scale modern estimation problems with predominantly incomplete data.
More from the Same Authors
-
2021 Spotlight: GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles »
Octavian Ganea · Lagnajit Pattanaik · Connor Coley · Regina Barzilay · Klavs Jensen · William Green · Tommi Jaakkola -
2021 Spotlight: PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition »
Cheng-I Jeff Lai · Yang Zhang · Alexander Liu · Shiyu Chang · Yi-Lun Liao · Yung-Sung Chuang · Kaizhi Qian · Sameer Khurana · David Cox · Jim Glass -
2021 : Consistent Accelerated Inference via Confident Adaptive Transformers »
Tal Schuster · Adam Fisch · Tommi Jaakkola · Regina Barzilay -
2021 : Fragment-Based Sequential Translation for Molecular Optimization »
Benson Chen · Xiang Fu · Regina Barzilay · Tommi Jaakkola -
2021 : Crystal Diffusion Variational Autoencoder for Periodic Material Generation »
Tian Xie · Xiang Fu · Octavian Ganea · Regina Barzilay · Tommi Jaakkola -
2022 : DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking »
Gabriele Corso · Hannes Stärk · Bowen Jing · Regina Barzilay · Tommi Jaakkola -
2022 : Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations »
Xiang Fu · Zhenghao Wu · Wujie Wang · Tian Xie · Sinan Keten · Rafael Gomez-Bombarelli · Tommi Jaakkola -
2022 : DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking »
Gabriele Corso · Hannes Stärk · Bowen Jing · Regina Barzilay · Tommi Jaakkola -
2022 : Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem »
Jason Yim · Brian L Trippe · Doug Tischer · David Baker · Tamara Broderick · Regina Barzilay · Tommi Jaakkola -
2022 : DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking »
Gabriele Corso · Hannes Stärk · Bowen Jing · Regina Barzilay · Tommi Jaakkola -
2022 : Is Conditional Generative Modeling all you need for Decision-Making? »
Anurag Ajay · Yilun Du · Abhi Gupta · Josh Tenenbaum · Tommi Jaakkola · Pulkit Agrawal -
2022 : Molecular Docking with Diffusion Generative Models »
Gabriele Corso · Hannes Stärk · Bowen Jing · Regina Barzilay · Tommi Jaakkola -
2023 Poster: Restart Sampling for Improving Generative Processes »
Yilun Xu · Mingyang Deng · Xiang Cheng · Yonglong Tian · Ziming Liu · Tommi Jaakkola -
2023 Poster: Compositional Sculpting of Iterative Generative Processes »
Timur Garipov · Sebastiaan De Peuter · Ge Yang · Vikas Garg · Samuel Kaski · Tommi Jaakkola -
2023 Poster: Hierarchical Planning with Foundation Models »
Anurag Ajay · Seungwook Han · Yilun Du · Shuang Li · Abhi Gupta · Tommi Jaakkola · Josh Tenenbaum · Leslie Kaelbling · Akash Srivastava · Pulkit Agrawal -
2023 Poster: Fundamental Limits and Tradeoffs in Invariant Representation Learning »
Han Zhao · Chen Dan · Bryon Aragam · Tommi Jaakkola · Geoffrey Gordon · Pradeep Ravikumar -
2022 Spotlight: Poisson Flow Generative Models »
Yilun Xu · Ziming Liu · Max Tegmark · Tommi Jaakkola -
2022 Spotlight: Lightning Talks 6B-1 »
Yushun Zhang · Duc Nguyen · Jiancong Xiao · Wei Jiang · Yaohua Wang · Yilun Xu · Zhen LI · Anderson Ye Zhang · Ziming Liu · Fangyi Zhang · Gilles Stoltz · Congliang Chen · Gang Li · Yanbo Fan · Ruoyu Sun · Naichen Shi · Yibo Wang · Ming Lin · Max Tegmark · Lijun Zhang · Jue Wang · Ruoyu Sun · Tommi Jaakkola · Senzhang Wang · Zhi-Quan Luo · Xiuyu Sun · Zhi-Quan Luo · Tianbao Yang · Rong Jin -
2022 : DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking »
Gabriele Corso · Hannes Stärk · Bowen Jing · Regina Barzilay · Tommi Jaakkola -
2022 : Invited Talk: Tommi Jaakkola »
Tommi Jaakkola -
2022 Poster: Fairness Reprogramming »
Guanhua Zhang · Yihua Zhang · Yang Zhang · Wenqi Fan · Qing Li · Sijia Liu · Shiyu Chang -
2022 Poster: Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing »
Yonggan Fu · Yang Zhang · Kaizhi Qian · Zhifan Ye · Zhongzhi Yu · Cheng-I Jeff Lai · Celine Lin -
2022 Poster: Torsional Diffusion for Molecular Conformer Generation »
Bowen Jing · Gabriele Corso · Jeffrey Chang · Regina Barzilay · Tommi Jaakkola -
2022 Poster: Poisson Flow Generative Models »
Yilun Xu · Ziming Liu · Max Tegmark · Tommi Jaakkola -
2021 : Consistent Accelerated Inference via Confident Adaptive Transformers »
Tal Schuster · Adam Fisch · Tommi Jaakkola · Regina Barzilay -
2021 Poster: TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up »
Yifan Jiang · Shiyu Chang · Zhangyang Wang -
2021 Poster: Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks »
Yonggan Fu · Qixuan Yu · Yang Zhang · Shang Wu · Xu Ouyang · David Cox · Yingyan Lin -
2021 Poster: GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles »
Octavian Ganea · Lagnajit Pattanaik · Connor Coley · Regina Barzilay · Klavs Jensen · William Green · Tommi Jaakkola -
2021 Poster: PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition »
Cheng-I Jeff Lai · Yang Zhang · Alexander Liu · Shiyu Chang · Yi-Lun Liao · Yung-Sung Chuang · Kaizhi Qian · Sameer Khurana · David Cox · Jim Glass -
2020 Poster: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang Wang -
2020 Spotlight: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang Wang -
2020 Poster: The Lottery Ticket Hypothesis for Pre-trained BERT Networks »
Tianlong Chen · Jonathan Frankle · Shiyu Chang · Sijia Liu · Yang Zhang · Zhangyang Wang · Michael Carbin -
2019 Poster: Solving graph compression via optimal transport »
Vikas Garg · Tommi Jaakkola -
2019 Poster: Generative Models for Graph-Based Protein Design »
John Ingraham · Vikas Garg · Regina Barzilay · Tommi Jaakkola -
2019 Poster: Direct Optimization through $\arg \max$ for Discrete Variational Auto-Encoder »
Guy Lorberbom · Andreea Gane · Tommi Jaakkola · Tamir Hazan -
2019 Poster: Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers »
Guang-He Lee · Yang Yuan · Shiyu Chang · Tommi Jaakkola -
2019 Poster: A Game Theoretic Approach to Class-wise Selective Rationalization »
Shiyu Chang · Yang Zhang · Mo Yu · Tommi Jaakkola -
2018 : Invited Talk Session 3 »
Alexandre Tkatchenko · Tommi Jaakkola · Jennifer Wei -
2018 Poster: Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization »
Sijia Liu · Bhavya Kailkhura · Pin-Yu Chen · Paishun Ting · Shiyu Chang · Lisa Amini -
2018 Poster: Towards Robust Interpretability with Self-Explaining Neural Networks »
David Alvarez-Melis · Tommi Jaakkola -
2017 Poster: Local Aggregative Games »
Vikas Garg · Tommi Jaakkola -
2017 Poster: Style Transfer from Non-Parallel Text by Cross-Alignment »
Tianxiao Shen · Tao Lei · Regina Barzilay · Tommi Jaakkola -
2017 Spotlight: Style Transfer from Non-parallel Text by Cross-Alignment »
Tianxiao Shen · Tao Lei · Regina Barzilay · Tommi Jaakkola -
2017 Poster: Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network »
Wengong Jin · Connor Coley · Regina Barzilay · Tommi Jaakkola -
2017 Poster: Dilated Recurrent Neural Networks »
Shiyu Chang · Yang Zhang · Wei Han · Mo Yu · Xiaoxiao Guo · Wei Tan · Xiaodong Cui · Michael Witbrock · Mark Hasegawa-Johnson · Thomas Huang -
2016 Poster: Learning Tree Structured Potential Games »
Vikas Garg · Tommi Jaakkola -
2015 Poster: From random walks to distances on unweighted graphs »
Tatsunori Hashimoto · Yi Sun · Tommi Jaakkola -
2015 Poster: Principal Differences Analysis: Interpretable Characterization of Differences between Distributions »
Jonas Mueller · Tommi Jaakkola -
2014 Poster: Accelerated Mini-batch Randomized Block Coordinate Descent Method »
Tuo Zhao · Mo Yu · Yiming Wang · Raman Arora · Han Liu -
2014 Poster: Controlling privacy in recommender systems »
Yu Xin · Tommi Jaakkola -
2013 Poster: Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions »
Tamir Hazan · Subhransu Maji · Joseph Keshet · Tommi Jaakkola -
2013 Poster: On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations »
Tamir Hazan · Subhransu Maji · Tommi Jaakkola -
2012 Workshop: Machine Learning Approaches to Mobile Context Awareness »
Katherine Ellis · Gert Lanckriet · Tommi Jaakkola · Lenny Grokop -
2012 Poster: Convergence Rate Analysis of MAP Coordinate Minimization Algorithms »
Ofer Meshi · Tommi Jaakkola · Amir Globerson -
2011 Tutorial: Linear Programming Relaxations for Graphical Models »
Amir Globerson · Tommi Jaakkola -
2010 Spotlight: More data means less inference: A pseudo-max approach to structured learning »
David Sontag · Ofer Meshi · Tommi Jaakkola · Amir Globerson -
2010 Poster: More data means less inference: A pseudo-max approach to structured learning »
David Sontag · Ofer Meshi · Tommi Jaakkola · Amir Globerson -
2008 Workshop: Approximate inference - how far have we come? »
Amir Globerson · David Sontag · Tommi Jaakkola -
2008 Poster: Clusters and Coarse Partitions in LP Relaxations »
David Sontag · Amir Globerson · Tommi Jaakkola -
2008 Spotlight: Clusters and Coarse Partitions in LP Relaxations »
David Sontag · Amir Globerson · Tommi Jaakkola -
2007 Oral: New Outer Bounds on the Marginal Polytope »
David Sontag · Tommi Jaakkola -
2007 Poster: New Outer Bounds on the Marginal Polytope »
David Sontag · Tommi Jaakkola -
2007 Poster: Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations »
Amir Globerson · Tommi Jaakkola -
2006 Talk: Approximate inference using planar graph decomposition »
Amir Globerson · Tommi Jaakkola -
2006 Poster: Approximate inference using planar graph decomposition »
Amir Globerson · Tommi Jaakkola -
2006 Poster: Game Theoretic Algorithms for Protein-DNA binding »
Luis Perez-Breva · Luis E Ortiz · Chen-Hsiang Yeang · Tommi Jaakkola -
2006 Spotlight: Game Theoretic Algorithms for Protein-DNA binding »
Luis Perez-Breva · Luis E Ortiz · Chen-Hsiang Yeang · Tommi Jaakkola -
2006 Poster: Parameter Expanded Variational Bayesian Methods »
Yuan (Alan) Qi · Tommi Jaakkola