`

Timezone: »

 
Workshop
eXplainable AI approaches for debugging and diagnosis
Roberto Capobianco · Biagio La Rosa · Leilani H Gilpin · Wen Sun · Alice Xiang · Alexander Feldman

Tue Dec 14 05:00 AM -- 02:30 PM (PST) @ None
Event URL: https://xai4debugging.github.io/ »

Recently, artificial intelligence (AI) has seen the explosion of deep learning (DL) models, which are able to reach super-human performance in several tasks. These improvements, however, come at a cost: DL models are ``black boxes’’, where one feeds an input and obtains an output without understanding the motivations behind that prediction or decision. The eXplainable AI (XAI) field tries to address such problems by proposing methods that explain the behavior of these networks.
In this workshop, we narrow the XAI focus to the specific case in which developers or researchers need to debug their models and diagnose system behaviors. This type of user typically has substantial knowledge about the models themselves but needs to validate, debug, and improve them.

This is an important topic for several reasons. For example, domains like healthcare and justice require that experts are able to validate DL models before deployment. Despite this, the development of novel deep learning models is dominated by trial-and-error phases guided by aggregated metrics and old benchmarks that tell us very little about the skills and utility of these models. Moreover, the debugging phase is a nightmare for practitioners too.

Another community that is working on tracking and debugging machine learning models is the visual analytics one, which proposes systems that help users to understand and interact with machine learning models. In the last years, the usage of methodologies that explain DL models became central in these systems. As a result, the interaction between the XAI and visual analytics communities became more and more important.

The workshop aims at advancing the discourse by collecting novel methods and discussing challenges, issues, and goals around the usage of XAI approaches to debug and improve current deep learning models. In order to achieve this goal, the workshop aims at bringing researchers and practitioners from both fields, strengthening their collaboration.

Join our Slack channel for Live and Offline Q/A with authors and presenters!

Tue 5:00 a.m. - 5:09 a.m.
Welcome (Opening)   
Roberto Capobianco
Tue 5:10 a.m. - 5:13 a.m.
Speaker Introduction (Introduction)
Wen Sun
Tue 5:14 a.m. - 5:52 a.m.
(Invited Talk)   

Machine learning has demonstrated being highly successful at solving many real-world applications ranging from information retrieval, data mining, and speech recognition, to computer graphics, visualization, and human-computer interaction. However, most users often treat the machine learning model as a “black box” because of its incomprehensible functions and unclear working mechanism. Without a clear understanding of how and why the model works, the development of high-performance models typically relies on a time-consuming trial-and-error procedure. This talk presents the major challenges explainable machine learning and exemplifies the solutions with several visual analytics techniques and examples, including data quality diagnosis, model understanding and diagnosis.

Shixia Liu is a professor at Tsinghua University. Her research interests include explainable machine learning, visual text analytics, and text mining. Shixia was elevated to an IEEE Fellow in 2021 and induced into IEEE Visualization Academy in 2020. She is an associate editor-in-chief of IEEE Transactions on Visualization and Computer Graphics and is an associate editor of Artificial Intelligence, IEEE Transactions on Big Data, and ACM Transactions on Intelligent Systems and Technology. She was one of the Papers Co-Chairs of IEEE VIS (VAST) 2016 and 2017 and is in the steering committee of IEEE VIS (2020-2023).

Shixia Liu
Tue 5:53 a.m. - 6:03 a.m.
Q/A Session (Live Q/A)
Wen Sun · Shixia Liu
Tue 6:04 a.m. - 6:05 a.m.
Speaker Introduction (Introduction)   
Biagio La Rosa
Tue 6:05 a.m. - 6:19 a.m.
(Oral) [ OpenReview  link »   

The Robotics community has started to heavily rely on increasingly realistic 3D simulators for large-scale training of robots on massive amounts of data. But once robots are deployed in the real world, the simulation gap, as well as changes in the real world (e.g. lights, objects displacements) lead to errors. In this paper, we introduce Sim2RealViz, a visual analytics tool to assist experts in understanding and reducing this gap for robot ego-pose estimation tasks, i.e. the estimation of a robot's position using trained models. Sim2RealViz displays details of a given model and the performance of its instances in both simulation and real-world. Experts can identify environment differences that impact model predictions at a given location and explore through direct interactions with the model hypothesis to fix it. We detail the design of the tool, and case studies related to the exploit of the regression to the mean bias and how it can be addressed, and how models are perturbed by the vanish of landmarks such as bikes.

Théo Jaunet · Guillaume Bono · Romain Vuillemot · Christian Wolf
Tue 6:20 a.m. - 6:25 a.m.
Q/A Session (Live Q/A)
Biagio La Rosa
Tue 6:25 a.m. - 6:35 a.m.
Break (10min) (Break)
Tue 6:35 a.m. - 6:37 a.m.
Speaker Introduction (Introduction)   
Biagio La Rosa
Tue 6:37 a.m. - 7:21 a.m.
(Invited Talk)   

AI is very successful at certain tasks, even exceeding human performance. Unfortunately, the most powerful methods suffer from both difficulty in explaining why a particular result was obtained and a lack of robustness. Our most powerful machine learning models are very sensitive to even small changes. Perturbations in the input data can have a dramatic impact on the output and lead to completely different results. This is of great importance in virtually all critical areas where we suffer from poor data quality, i.e., we do not have the expected i.i.d. data. Therefore, the use of AI in areas that impact human life (agriculture, climate, health, ...) has led to an increased demand for trustworthy AI. In sensitive areas where traceability, transparency and interpretability are required, explainability is now even mandatory due to regulatory requirements. One possible step to make AI more robust is to combine statistical learning with knowledge representations. For certain tasks, it may be beneficial to include a human in the loop. A human expert can - sometimes, of course, not always - bring experience, expertise and conceptual understanding to the AI pipeline. Such approaches are not only a solution from a legal perspective, but in many application areas, the "why" is often more important than a pure classification result. Consequently, both explainability and robustness can promote reliability and trust and ensure that humans remain in control, thus complementing human intelligence with artificial intelligence.

Andreas Holzinger
Tue 7:22 a.m. - 7:32 a.m.
Q/A Session (Live Q/A)
Biagio La Rosa · Andreas Holzinger
Tue 7:33 a.m. - 7:34 a.m.
Speaker Introduction (Introduction)   
Leilani H Gilpin
Tue 7:34 a.m. - 7:49 a.m.
(Oral) [ OpenReview  link »   

In this work, we propose a practical scheme to enforce monotonicity in neural networks with respect to a given subset of the dimensions of the input space. The proposed approach focuses on the setting where point-wise gradient penalties are used as a soft constraint alongside the empirical risk during training. Our results indicate that the choice of the points employed for computing such a penalty defines the regions of the input space where the desired property is satisfied. As such, previous methods result in models that are monotonic either only at the boundaries of the input space or in the small volume where training data lies. Given this, we propose an alternative approach that uses pairs of training instances and random points to create mixtures of points that lie inside and outside of the convex hull of the training sample. Empirical evaluation carried out using different datasets show that the proposed approach yields predictors that are monotonic in a larger volume of the space compared to previous methods. Our approach does not introduce relevant computational overhead leading to an efficient procedure that consistently achieves the best performance amongst all alternatives.

Joao Monteiro · mohamed.o.ahmed · Hossein Hajimirsadeghi · Greg Mori
Tue 7:50 a.m. - 7:55 a.m.
Q/A Session (Live Q/A)
Leilani H Gilpin
Tue 7:55 a.m. - 8:05 a.m.
Break (10min) (Break)
Tue 8:05 a.m. - 8:07 a.m.
Speaker Introduction (Introduction)   
Biagio La Rosa
Tue 8:07 a.m. - 8:20 a.m.
(A glimpse of the future Track)   

Recent advances in neural machine translation (NMT) led to the integration of deep learning-based systems as essential components of most professional translation workflows. As a consequence, human translators are increasingly working as post-editors for machine-translated content. This project aims to empower NMT users by improving their ability to interact with NMT models and interpret their behaviors. In this context, new tools and methodologies will be developed and adapted from other domains to improve prediction attribution, error analysis, and controllable generation for NMT systems. These components will drive the development of an interactive CAT tool conceived to improve post-editing efficiency, and their effectiveness will then be validated through a field study with professional translators.

Gabriele Sarti
Tue 8:21 a.m. - 8:26 a.m.
Q/A Session (Live Q/A)
Biagio La Rosa · Gabriele Sarti
Tue 8:27 a.m. - 8:28 a.m.
Speaker Introduction (Introduction)   
Biagio La Rosa
Tue 8:28 a.m. - 8:41 a.m.
(Oral) [ OpenReview  link »   

Deep Learning has become overly complicated and has enjoyed stellar success in solving several classical problems like image classification, object detection, etc. Several methods for explaining these decisions have been proposed. Black box methods to generate saliency maps are particularly interesting due to the fact that they do not utilize the internals of the model to explain the decision. Major black box methods perturb the input and observe the changes in the output. We formulate saliency map generation as a sequential search problem and leverage upon Reinforcement Learning (RL) to accumulate evidence from input images that most strongly support decisions made by a classifier. Such a strategy encourages to search \emph{intelligently} for the perturbations that will lead to high-quality explanations. While successful black box explanation approaches need to rely on heavy computations and suffer from small sample approximation, the deterministic policy learned by our method makes it a lot more efficient during the inference. Experiments on three benchmark datasets demonstrate the superiority of the proposed approach in inference time over state-of-the-arts without hurting the performance. The anonymized code can be found at https://anonymous.4open.science/r/RExL-88F8

Siddhant Agarwal · OWAIS IQBAL · Sree Aditya Buridi · Madda Manjusha · Abir Das
Tue 8:42 a.m. - 8:47 a.m.
Q/A Session (Live Q/A)
Biagio La Rosa
Tue 8:48 a.m. - 8:50 a.m.
Spotlight Introduction (Introduction)   
Biagio La Rosa
Tue 8:50 a.m. - 8:53 a.m.
(Spotlight) [ OpenReview  link »   

The MHC class-I pathway supports the detection of cancer and viruses by the immune system. It presents parts of proteins (peptides) from inside a cell on its membrane surface enabling visiting immune cells that detect non-self peptides to terminate the cell. The ability to predict whether a peptide will get presented on MHC Class I molecules helps in designing vaccines so they can activate the immune system to destroy the invading disease protein. We designed a prediction model using a BERT-based architecture (ImmunoBERT) that takes as input a peptide and its surrounding regions (N and C-terminals) along with a set of MHC-I molecules. We present a novel application of well known interpretability techniques, SHAP and LIME, to this domain and we use these results along with 3D structure visualizations and amino acid frequencies to understand and identify the most influential parts of the input amino acid sequences contributing to the output. In particular, we find that amino acids close to the peptides' N- and C-terminals are highly relevant. Additionally, some positions within the MHC proteins (in particular in the A, B and F pockets) are often assigned a high importance ranking - which confirms biological studies and the distances in the structure visualizations.

Hans-Christof Gasser
Tue 8:53 a.m. - 8:57 a.m.
(Spotlight) [ OpenReview  link »   

Explainable AI (XAI) methods are frequently applied to obtain qualitative insights about deep models' predictions. However, such insights need to be interpreted by a human observer to be useful. In this paper, we aim to use explanations directly to make decisions without human observers. We adopt two gradient-based explanation methods, Integrated Gradients (IG) and backprop, for the task of 3D object detection. Then, we propose a set of quantitative measures, named Explanation Concentration (XC) scores, that can be used for downstream tasks. These scores quantify the concentration of attributions within the boundaries of detected objects. We evaluate the effectiveness of XC scores via the task of distinguishing true positive (TP) and false positive (FP) detected objects in the KITTI and Waymo datasets. The results demonstrate improvement of more than 100\% on both datasets compared to other heuristics such as random guesses and number of LiDAR points in bounding box, raising confidence in XC's potential for application in more use cases. Our results also indicate that computationally expensive XAI methods like IG may not be more valuable when used quantitatively compare to simpler methods.

Sunsheng Gu · Vahdat Abdelzad · Krzysztof Czarnecki
Tue 8:57 a.m. - 9:00 a.m.
(Spotlight) [ OpenReview  link »   

Monolithic deep learning models are typically not interpretable, and not easily transferable. They also require large amounts of data for training the millions of parameters. Alternatively, modular neural networks (MNN) have been demonstrated to solve these very issues of monolithic neural networks. However, to date, research in MNN architectures has concentrated on their performance and not on their interpretability. In this paper we would like to address this gap in research in modular neural architectures, specifically in the gated modular neural architecture (GMNN). Intuitively, GMNN should inherently be more interpretable since the gate can learn insightful problem decomposition, individual modules can learn simpler functions appropriate to the decomposition and errors can be attributed either to gating or to individual modules thereby providing either a gate level or module level diagnosis. Wouldn't that be nice? But is this really the case? In this paper we empirically analyze what each module and gate in a GMNN learns and show that GMNNs can indeed be interpretable, but current GMNN architectures and training methods do not necessarily guarantee an interpretable and transferable task decomposition.

Yamuna Krishnamurthy · Chris Watkins
Tue 9:00 a.m. - 9:03 a.m.
(Spotlight) [ OpenReview  link »   

Understanding the results of deep neural networks is an essential step towards wider acceptance of deep learning algorithms. Many approaches address the issue of interpreting artificial neural networks, but often provide divergent explanations. Moreover, different hyperparameters of an explanatory method can lead to conflicting interpretations. In this paper, we propose a technique for aggregating the feature attributions of different explanatory algorithms using a Restricted Boltzmann machine (RBM) to achieve a more accurate and robust interpretation of deep neural networks. Several challenging experiments on real-world datasets show that the proposed RBM method outperforms popular feature attribution methods and basic ensemble techniques.

Vadim Borisov · Johannes Meier · Johan Van den Heuvel · Hamed Jalali · Gjergji. Kasneci
Tue 9:03 a.m. - 9:06 a.m.
(Spotlight) [ OpenReview  link »   

The filters learned by Convolutional Neural Networks (CNNs) and the feature maps these filters compute are sensitive to convolution arithmetic. Several architectural choices that dictate this arithmetic can result in feature-map artifacts. These artifacts can interfere with the downstream task and impact the accuracy and robustness. We provide a number of visual-debugging means to surface feature-map artifacts and to analyze how they emerge in CNNs. Our means help analyze the impact of these artifacts on the weights learned by the model. Guided by our analysis, model developers can make informed architectural choices that can verifiably mitigate harmful artifacts and improve the model’s accuracy and its shift robustness.

Bilal Alsallakh · Narine Kokhlikyan · Vivek Miglani · Shubham Muttepawar · Edward Wang · Sara Zhang · Orion Reblitz-Richardson
Tue 9:06 a.m. - 9:09 a.m.
(Spotlight) [ OpenReview  link »   

We typically compute aggregate statistics on held-out test data to assess the generalization of machine learning models. However, test data is only so comprehensive, and in practice, important cases are often missed. Thus, the performance of deployed machine learning models can be variable and untrustworthy. Motivated by these concerns, we develop methods to generate and correct novel model errors beyond those available in the data. We propose Defuse: a technique that trains a generative model on a classifier’s training dataset and then uses the latent space to generate new samples which are no longer correctly predicted by the classifier. For instance, given a classifier trained on the MNIST dataset that correctly predicts a test image, Defuse then uses this image to generate new similar images by sampling from the latent space. Defuse then identifies the images that differ from the label of the original test input. Defuse enables efficient labeling of these new images, allowing users to re-train a more robust model, thus improving overall model performance. We evaluate the performance of Defuse on classifiers trained on real world datasets and find it reveals novel sources of model errors.

Dylan Slack · Krishnaram Kenthapadi
Tue 9:09 a.m. - 9:12 a.m.
(Spotlight) [ OpenReview  link »   

When an image classifier outputs a wrong class label, it can be helpful to see what changes in the image would lead to a correct classification. This is the aim of algorithms generating counterfactual explanations. However, there is no easily scalable method to generate such counterfactuals. We develop a new algorithm providing counterfactual explanations for large image classifiers trained with spectral normalisation at low computational cost. We empirically compare this algorithm against baselines from the literature; our novel algorithm consistently finds counterfactuals that are much closer to the original inputs. At the same time, the realism of these counterfactuals is comparable to the baselines.

Benedikt Höltgen · Lisa Schut · Jan Brauner · Yarin Gal
Tue 9:12 a.m. - 9:22 a.m.
Break (10min) (Break)
Tue 9:22 a.m. - 9:25 a.m.
Speaker Introduction (Introduction)
Alexander Feldman
Tue 9:26 a.m. - 10:04 a.m.
[IT3] Towards Reliable and Robust Model Explanations (Invited Talk)   
Himabindu Lakkaraju
Tue 10:05 a.m. - 10:15 a.m.
Q/A Session (Live Q/A)
Alexander Feldman · Himabindu Lakkaraju
Tue 10:16 a.m. - 10:17 a.m.
Speaker Introduction (Introduction)   
Roberto Capobianco
Tue 10:17 a.m. - 10:32 a.m.
(Oral) [ OpenReview  link »   

While many studies have shown that linguistic information is encoded in hidden word representations, few have studied individual neurons, to show how and in which neurons it is encoded.Among these, the common approach is to use an external probe to rank neurons according to their relevance to some linguistic attribute, and to evaluate the obtained ranking using the same probe that produced it.We show that this methodology confounds distinct factors---probe quality and ranking quality---and thus we separate them.We compare two recent ranking methods and a novel one we introduce, both by probing and by causal interventions, where we modify the representations and observe the effect on the model's output.We show that encoded information and used information are not always the same, and that individual neurons can be used to control the model's output, to some extent.Our method can be used to identify how certain information is encoded, and how to manipulate it for debugging purposes.

Omer Antverg · Yonatan Belinkov
Tue 10:33 a.m. - 10:38 a.m.
Q/A Session (Live Q/A)
Roberto Capobianco
Tue 10:38 a.m. - 10:50 a.m.
Break (12min) (Break)
Tue 10:50 a.m. - 10:51 a.m.
Speaker Introduction (Introduction)   
Leilani H Gilpin
Tue 10:51 a.m. - 11:23 a.m.
(Invited Talk)   

Ascertaining that a deep network does not rely on an unknown spurious signal as basis for its output, prior to deployment, is crucial in high stakes settings like healthcare. While many post hoc explanation methods have been shown to be useful for some end tasks, recent theoretical and empirical evidence suggests that these methods may not be faithful or useful. This leaves little guidance for a practitioner or a researcher using these methods in their decision process. In this talk, we will consider three classes of post hoc explanations--feature attribution, concept activation, and training point ranking--; and ask whether these approaches can alert a practitioner as to a model's reliance on unknown spurious training signals.

Julius Adebayo
Tue 11:24 a.m. - 11:34 a.m.
Q/A Session (Live Q/A)
Leilani H Gilpin · Julius Adebayo
Tue 11:35 a.m. - 11:36 a.m.
Speaker Introduction (Introduction)   
Roberto Capobianco
Tue 11:36 a.m. - 12:48 p.m.
(Oral) [ OpenReview  link »   

Feature attribution methods are exceedingly popular in interpretable machine learning. They aim to compute the attribution of each input feature to represent its importance, but there is no consensus on the definition of "attribution", leading to many competing methods with little systematic evaluation. The lack of ground truth for feature attribution particularly complicates evaluation; to address this, we propose a dataset modification procedure where we construct attribution ground truth. Using this procedure, we evaluate three common interpretability methods: saliency maps, rationales, and attention. We identify several deficiencies and add new perspectives to the growing body of evidence questioning the correctness and reliability of these methods in the wild. Our evaluation approach is model-agnostic and can be used to assess future feature attribution method proposals as well.

Yilun Zhou · Serena Booth · Marco Tulio Ribeiro · Julie A Shah
Tue 11:49 a.m. - 11:54 a.m.
Q/A Session (Live Q/A)
Roberto Capobianco
Tue 11:55 a.m. - 12:10 p.m.
Break (15min) (Break)
Tue 12:10 p.m. - 12:11 p.m.
Speaker Introduction (Introduction)   
Roberto Capobianco
Tue 12:11 p.m. - 12:27 p.m.
(Oral) [ OpenReview  link »   

Transformer-based models are receiving increasingly popularity in the field of computer vision, however, the corresponding interpretability is limited. As the simplest explainability method, visualization of attention weights exerts poor performance because of lacking association between the input and model decisions. In this study, we propose a method to generate the saliency map concerning a specific target category. The proposed approach connects the idea of the Markov chain, in order to investigate the information flow across layers of the Transformer and combine the integrated gradients to compute the relevance of input tokens for the model decisions. We compare with other explainability methods using Vision Transformer as a benchmark and demonstrate that our method achieves better performance in various aspects. Our code is available in the anonymized repository: https://anonymous.4open.science/r/TransitionAttentionMaps-8C62.

Tingyi Yuan · Xuhong Li · Haoyi Xiong · Dejing Dou
Tue 12:28 p.m. - 12:32 p.m.
Q/A Session (Live Q/A)
Roberto Capobianco
Tue 12:33 p.m. - 12:36 p.m.
Speaker Introduction (Introduction)   
Alice Xiang
Tue 12:37 p.m. - 1:21 p.m.
(Invited Talk)   

Despite major efforts in recent years to improve explainability of deep neural networks, the tools we use for communicating explanations have largely remained the same: visualizations of representative inputs, salient input regions, and local model approximations. But when humans describe complex decision rules, we often use a different explanatory tool: natural language. I'll describe recent work on explaining models for computer vision tasks by automatically constructing natural language descriptions of individual neurons. These descriptions ground prediction in meaningful perceptual and linguistic abstractions, and can be used to surface unexpected model behaviors, and identify and mitigate adversarial vulnerabilities. These results show that fine-grained, automatic annotation of deep network models is both possible and practical: rich, language-based explanations produced by automated annotation procedures can surface meaningful and actionable information about deep networks.

Jacob Andreas
Tue 1:22 p.m. - 1:32 p.m.
Q/A Session
Alice Xiang · Jacob Andreas
Tue 1:33 p.m. - 1:35 p.m.
Spotlight Introduction (Introduction)   
Biagio La Rosa
Tue 1:35 p.m. - 1:39 p.m.
(Spotlight) [ OpenReview  link »   

SHAP (SHapley Additive exPlanation) values are one of the leading tools for interpreting machine learning models, with strong theoretical guarantees (consistency, local accuracy) and a wide availability of implementations and use cases. Even though computing SHAP values takes exponential time in general, TreeSHAP takes polynomial time on tree-based models. While the speedup is significant, TreeSHAP can still dominate the computation time of industry-level machine learning solutions on datasets with millions or more entries, causing delays in post-hoc model diagnosis and interpretation service. In this paper we present two new algorithms, Fast TreeSHAP v1 and v2, designed to improve the computational efficiency of TreeSHAP for large datasets. We empirically find that Fast TreeSHAP v1 is 1.5x faster than TreeSHAP while keeping the memory cost unchanged. Similarly, Fast TreeSHAP v2 is 2.5x faster than TreeSHAP, at the cost of a slightly higher memory usage, thanks to the pre-computation of expensive TreeSHAP steps. We also show that Fast TreeSHAP v2 is well-suited for multi-time model interpretations, resulting in as high as 3x faster explanation of newly incoming samples.

Jilei Yang
Tue 1:39 p.m. - 1:42 p.m.
(Spotlight) [ OpenReview  link »   

Traditionally, evaluation of explanations falls into one of two camps: proxy metrics (an algorithmic evaluation based on desirable properties) or human user studies (an experiment with real users that puts explanations to the test in real use cases). For the purpose of determining suitable explanations for a desired real-world use case, the former is efficient to compute but disconnected from the use case itself. Meanwhile, the latter is time-consuming to organize and often difficult to get right. We argue for the inclusion of a new type of evaluation in the evaluation workflow that capitalizes on the strengths of both called Simulated User Evaluations, an algorithmic evaluation grounded in real use cases. We provide a two-phase framework to conduct Simulated User Evaluations and demonstrate that by instantiating this framework for local explanations we can use Simulated User Evaluations to recreate findings from existing user studies for two use cases (identifying data bugs and performing forward simulation). Additionally, we demonstrate the ability to use Simulated User Evaluations to provide insight into the design of new studies.

Valerie Chen · Gregory Plumb · Nicholay Topin · Ameet S Talwalkar
Tue 1:42 p.m. - 1:45 p.m.
(Spotlight) [ OpenReview  link »   

Explainable AI has the potential to support more interactive and fluid co-creative AI systems which can creatively collaborate with people. To do this, creative AI models need to be amenable to debugging by offering eXplainable AI (XAI) features which are inspectable, understandable, and modifiable. However, currently there is very little XAI for the arts. In this work, we demonstrate how a latent variable model for music generation can be made more explainable; specifically we extend MeasureVAE which generates measures of music. We increase the explainability of the model by: i) using latent space regularisation to force some specific dimensions of the latent space to map to meaningful musical attributes, ii) providing a user interface feedback loop to allow people to adjust dimensions of the latent space and observe the results of these changes in real-time, iii) providing a visualisation of the musical attributes in the latent space to help people predict the effect of changes to latent space dimensions. We thus bridge the gap between the latent space and the generated musical outcomes in a meaningful way which makes the model and its outputs more explainable and more debuggable.

Nick Bryan-Kinns · Berker Banar · Corey Ford · Simon Colton
Tue 1:45 p.m. - 1:50 p.m.
(Spotlight) [ OpenReview  link »   

Transformer-based language models trained on large text corpora have enjoyed immense popularity in the natural language processing (NLP) community and are commonly used as a starting point for downstream NLP tasks. While these models are undeniably useful, it is a challenge to quantify their performance beyond traditional accuracy metrics. We aim to compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process. Structured relationships from training corpora may be uncovered through querying a masked language model with probing tasks. In this paper, we present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements at various stages of RoBERTa's early training. We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa). This work offers a quantitative framework to compare language models through knowledge graph extraction and showcases a part-of-speech analysis to identify the linguistic strengths of each model variant. These analyses allow the opportunity for machine learning practitioners to compare models, diagnose their models' behavioral strengths and weaknesses, and identify new targeted datasets to improve model performance.

Vinitra Swamy · Angelika Romanou · Martin Jaggi
Tue 1:50 p.m. - 1:53 p.m.
(Spotlight) [ OpenReview  link »   

In recent years, there has been significant work on increasing both interpretability and debuggability of a Deep Neural Network (DNN) by extracting a rule-based model that approximates its decision boundary. Nevertheless, current DNN rule extraction methods that consider a DNN's latent space when extracting rules, known as decompositional algorithms, are either restricted to single-layer DNNs or intractable as the size of the DNN or data grows. In this paper, we address these limitations by introducing ECLAIRE, a novel polynomial-time rule extraction algorithm capable of scaling to both large DNN architectures and large training datasets. We evaluate ECLAIRE on a wide variety of tasks, ranging from breast cancer prognosis to particle detection, and show that it consistently extracts more accurate and comprehensible rule sets than the current state-of-the-art methods while using orders of magnitude less computational resources. We make all of our methods available, including a rule set visualisation interface, through the open-source REMIX library.

Mateo Espinosa Zarlenga · Mateja Jamnik
Tue 1:53 p.m. - 1:57 p.m.
(Spotlight) [ OpenReview  link »   

Saliency methods are a popular approach for model debugging and explainability. However, in the absence of ground-truth data for what the correct maps should be, evaluating and comparing different approaches remains a long-standing challenge. The sanity checks methodology of Adebayo et al [Neurips 2018] has sought to address this challenge. They argue that some popular saliency methods should not be used for explainability purposes since the maps they produce are not sensitive to the underlying model that is to be explained. In this work, we revisit the logic behind their proposed methodology. We cast the objective of ruling out a saliency method as not being sensitive to the model as a causal inference question, and use this to argue that their empirical results do not sufficiently establish their conclusions, due to a form of confounding that may be inherent to the tasks they evaluate on. On a technical level, our findings call into question the current consensus around some methods being not suitable for explainability purposes. On a broader level, our work further highlights the challenges involved with saliency map evaluation.

Gal Yona
Tue 1:57 p.m. - 2:01 p.m.
(Spotlight) [ OpenReview  link »   

Understanding and explaining the decisions of neural networks is of great importance, for safe deployment as well as for legal reasons.In this paper, we consider visual explanations for deep image classifiers that are both informative and understandable by humans. Motivated by the recent FullGrad method, we find that bringing information from multiple layers is very effective in producing explanations. Based on this observation, we propose a new method, DeepMaps, that combines information from hidden activities. We show that our method outranks alternative explanations with respect to metrics established in the literature, which are based on pixel perturbations. While these evaluations are based on changes in the class scores, we propose to directly consider the change in the network's decisions. Noting that perturbation-based metrics can fail to distinguish random explanations from sensible ones, we propose to measure the quality of a given explanation by comparing it to explanations for randomly selected other images. We demonstrate through experiments that DeepMaps outperforms existing methods according to the resulting evaluation metrics as well.

Agnieszka Grabska-Barwinska · Amal Rannen-Triki · Omar Rivasplata · András György
Tue 2:00 p.m. - 2:06 p.m.
Closing Remarks (Closing)   
Biagio La Rosa
Tue 2:06 p.m. - 2:30 p.m.
Poster Session (Link)  link »
-
Our Slack Channel for Q/A, social and networking (Link)  link »

Author Information

Roberto Capobianco (Sapienza University of Rome & Sony AI)
Biagio La Rosa (Sapienza University of Rome)
Leilani H Gilpin (UC Santa Cruz)

I'm a PhD Student in the Department of Electrical Engineering and Computer Science (EECS-Course 6) and the Artificial Intelligence Lab (CSAIL) at MIT working under the supervision of Professor Gerald Jay Sussman. My research is in the area of Artificial Intelligence, where I am working to help autonomous vehicles (and other autonomous machines) to explain themselves. Before returning to academia, I worked at Palo Alto Research Center as a Member of Technical Staff where I worked on anomaly detection in healthcare. I received an M.S. student in Computational and Mathematical Engineering at Stanford University, and a B.S. in Computer Science with Highest Honors, a B.S. in Mathematics with Honors, and a Music Minor at UC San Diego.

Wen Sun (Cornell University)
Alice Xiang (Sony AI)
Alexander Feldman (Xerox PARC)

More from the Same Authors