Timezone: »
An emerging problem in trustworthy machine learning is to train models that produce robust interpretations for their predictions. We take a step towards solving this problem through the lens of axiomatic attribution of neural networks. Our theory is grounded in the recent work, Integrated Gradients (IG) [STY17], in axiomatically attributing a neural network’s output change to its input change. We propose training objectives in classic robust optimization models to achieve robust IG attributions. Our objectives give principled generalizations of previous objectives designed for robust predictions, and they naturally degenerate to classic soft-margin training for one-layer neural networks. We also generalize previous theory and prove that the objectives for different robust optimization models are closely related. Experiments demonstrate the effectiveness of our method, and also point to intriguing problems which hint at the need for better optimization techniques or better neural network architectures for robust attribution training.
Author Information
Jiefeng Chen (University of Wisconsin-Madison)
I am currently a third year Phd student at University of Wisconsin-Madison, in the Computer Science Department. I am co-advised by Prof. Yingyu Liang and Prof. Somesh Jha. I work on trustworthy machine learning with research questions like "How to make machine learning models produce stable explanations of their predictions?", "How to train models that produce robust predictions under adversarial perturbations?, and "Understand when and why some defense mechanisms work?". I obtained my Bachelor's degree in Computer Science from Shanghai Jiao Tong University (SJTU).
Xi Wu (Google)
Vaibhav Rastogi (Google)
Yingyu Liang (University of Wisconsin Madison)
Somesh Jha (University of Wisconsin, Madison)
More from the Same Authors
-
2022 : Domain Generalization with Nuclear Norm Regularization »
Zhenmei Shi · Yifei Ming · Ying Fan · Frederic Sala · Yingyu Liang -
2022 : Best of Both Worlds: Towards Adversarial Robustness with Transduction and Rejection »
Nils Palumbo · Yang Guo · Xi Wu · Jiefeng Chen · Yingyu Liang · Somesh Jha -
2023 Poster: Grounding Neural Inference with Satisfiability Modulo Theories »
Matt Fredrikson · Kaiji Lu · Somesh Jha · Saranya Vijayakumar · Vijay Ganesh · Zifan Wang -
2023 Poster: Doubly Robust Peer-To-Peer Learning Protocol »
Nicholas Franzese · Adam Dziedzic · Christopher Choquette-Choo · Mark Thomas · Muhammad Ahmad Kaleem · Stephan Rabanser · Congyu Fang · Somesh Jha · Nicolas Papernot · Xiao Wang -
2023 Poster: Dissecting Knowledge Distillation: An Exploration of its Inner Workings and Applications »
Utkarsh Ojha · Yuheng Li · Anirudh Sundara Rajan · Yingyu Liang · Yong Jae Lee -
2023 Poster: Provable Guarantees for Neural Networks via Gradient Feature Learning »
Zhenmei Shi · Junyi Wei · Yingyu Liang -
2022 Spotlight: Lightning Talks 2A-2 »
Harikrishnan N B · Jianhao Ding · Juha Harviainen · Yizhen Wang · Lue Tao · Oren Mangoubi · Tong Bu · Nisheeth Vishnoi · Mohannad Alhanahnah · Mikko Koivisto · Aditi Kathpalia · Lei Feng · Nithin Nagaraj · Hongxin Wei · Xiaozhu Meng · Petteri Kaski · Zhaofei Yu · Tiejun Huang · Ke Wang · Jinfeng Yi · Jian Liu · Sheng-Jun Huang · Mihai Christodorescu · Songcan Chen · Somesh Jha -
2022 Spotlight: Robust Learning against Relational Adversaries »
Yizhen Wang · Mohannad Alhanahnah · Xiaozhu Meng · Ke Wang · Mihai Christodorescu · Somesh Jha -
2022 Poster: Overparameterization from Computational Constraints »
Sanjam Garg · Somesh Jha · Saeed Mahloujifar · Mohammad Mahmoody · Mingyuan Wang -
2022 Poster: Robust Learning against Relational Adversaries »
Yizhen Wang · Mohannad Alhanahnah · Xiaozhu Meng · Ke Wang · Mihai Christodorescu · Somesh Jha -
2022 Poster: A Quantitative Geometric Approach to Neural-Network Smoothness »
Zi Wang · Gautam Prakriya · Somesh Jha -
2021 Poster: Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles »
Jiefeng Chen · Frederick Liu · Besim Avci · Xi Wu · Yingyu Liang · Somesh Jha -
2021 Poster: A Separation Result Between Data-oblivious and Data-aware Poisoning Attacks »
Samuel Deng · Sanjam Garg · Somesh Jha · Saeed Mahloujifar · Mohammad Mahmoody · Abhradeep Guha Thakurta -
2020 Poster: Functional Regularization for Representation Learning: A Unified Theoretical Perspective »
Siddhant Garg · Yingyu Liang -
2019 Poster: Attribution-Based Confidence Metric For Deep Neural Networks »
Susmit Jha · Sunny Raj · Steven Fernandes · Sumit K Jha · Somesh Jha · Brian Jalaian · Gunjan Verma · Ananthram Swami -
2019 Poster: N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules »
Shengchao Liu · Mehmet Demirel · Yingyu Liang -
2019 Spotlight: N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules »
Shengchao Liu · Mehmet Demirel · Yingyu Liang -
2019 Poster: Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers »
Zeyuan Allen-Zhu · Yuanzhi Li · Yingyu Liang -
2018 : Semantic Adversarial Examples by Somesh Jha »
Somesh Jha -
2018 Poster: Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data »
Yuanzhi Li · Yingyu Liang -
2018 Spotlight: Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data »
Yuanzhi Li · Yingyu Liang