Timezone: »

 
Poster
Measures of Information Reflect Memorization Patterns
Rachit Bansal · Danish Pruthi · Yonatan Belinkov

Thu Dec 01 09:00 AM -- 11:00 AM (PST) @ Hall J #408

Neural networks are known to exploit spurious artifacts (or shortcuts) that co-occur with a target label, exhibiting heuristic memorization. On the other hand, networks have been shown to memorize training examples, resulting in example-level memorization. These kinds of memorization impede generalization of networks beyond their training distributions. Detecting such memorization could be challenging, often requiring researchers to curate tailored test sets. In this work, we hypothesize—and subsequently show—that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization. We quantify the diversity in the neural activations through information-theoretic measures and find support for our hypothesis in experiments spanning several natural language and vision tasks. Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabeled in-distribution examples. Lastly, we demonstrate the utility of our findings for the problem of model selection.

Author Information

Rachit Bansal (Google Research India)
Danish Pruthi (Amazon → Indian Institute of Science (IISc), Bangalore)
Danish Pruthi

I am Danish Pruthi, I currently work as an Applied Scientist at Amazon AI, contributing towards their long-term model understanding efforts. I received my PhD from CMU, where my dissertation research focused on addressing issues concerning the interpretability of deep learning models.

Yonatan Belinkov (Technion)

More from the Same Authors

  • 2022 Poster: Learning to Scaffold: Optimizing Model Explanations for Teaching »
    Patrick Fernandes · Marcos Treviso · Danish Pruthi · André Martins · Graham Neubig
  • 2022 Poster: Locating and Editing Factual Associations in GPT »
    Kevin Meng · David Bau · Alex Andonian · Yonatan Belinkov
  • 2018 : Coffee break + posters 2 »
    Jan Kremer · Erik McDermott · Brandon Carter · Albert Zeyer · Andreas Krug · Paul Pu Liang · Katherine Lee · Dominika Basaj · Abelino Jimenez · Lisa Fan · Gautam Bhattacharya · Tzeviya S Fuchs · David Gifford · Loren Lugosch · Orhan Firat · Benjamin Baer · JAHANGIR ALAM · Jamin Shin · Mirco Ravanelli · Paul Smolensky · Zining Zhu · Hamid Eghbal-zadeh · Skyler Seto · Imran Sheikh · Joao Felipe Santos · Yonatan Belinkov · Nadir Durrani · Oiwi Parker Jones · Shuai Tang · André Merboldt · Titouan Parcollet · Wei-Ning Hsu · Krishna Pillutla · Ehsan Hosseini-Asl · Monica Dinculescu · Alexander Amini · Ying Zhang · Taoli Cheng · Alain Tapp
  • 2018 : Coffee break + posters 1 »
    Samuel Myer · Wei-Ning Hsu · Jialu Li · Monica Dinculescu · Lea Schönherr · Ehsan Hosseini-Asl · Skyler Seto · Oiwi Parker Jones · Imran Sheikh · Thomas Manzini · Yonatan Belinkov · Nadir Durrani · Alexander Amini · Johanna Hansen · Gabi Shalev · Jamin Shin · Paul Smolensky · Lisa Fan · Zining Zhu · Hamid Eghbal-zadeh · Benjamin Baer · Abelino Jimenez · Joao Felipe Santos · Jan Kremer · Erik McDermott · Andreas Krug · Tzeviya S Fuchs · Shuai Tang · Brandon Carter · David Gifford · Albert Zeyer · André Merboldt · Krishna Pillutla · Katherine Lee · Titouan Parcollet · Orhan Firat · Gautam Bhattacharya · JAHANGIR ALAM · Mirco Ravanelli
  • 2017 Poster: Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems »
    Yonatan Belinkov · Jim Glass