NeurIPS 2023

Workshop

Successor Heads: Recurring, Interpretable Attention Heads In The Wild
Rhys Gould · Euan Ong · George Ogden · Arthur Conmy

Workshop

Incorporating Additive Separability into Hamiltonian Neural Networks for Regression and Interpretation
Zi-Yu Khoo · Jonathan Sze Choong Low · Stéphane Bressan

Workshop

Single-cell Masked Autoencoder: An Accurate and Interpretable Automated Immunophenotyper
Jaesik Kim · Matei Ionita · Matthew Lee · Michelle McKeague · Ajinkya Pattekar · Mark Painter · Joost Wagenaar · Van Q. Truong · Dylan Norton · Divij Mathew · Yonghyun Nam · Sokratis Apostolidis · Patryk Orzechowski · Sang-Hyuk Jung · Jakob Woerner · Yidi Huang · Nuala Meyer · Allison Greenplate · Dokyoon Kim · John Wherry

Workshop

Sat 14:07

Scale Alone Does not Improve Mechanistic Interpretability in Vision Models
Roland S. Zimmermann · Thomas Klein · Wieland Brendel

Workshop

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Yida Chen · Fernanda Viégas · Martin Wattenberg

Workshop

Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT
Zechen Zhang · Dean Hazineh · Jeffrey Chiu

Workshop

InterpreTabNet: Enhancing Interpretability of Tabular Data Using Deep Generative Models and Large Language Models
Jacob Yoke Hong Si · Rahul Krishnan · Michael Cooper · Wendy Yusi Cheng

Workshop

Adversarial Attacks on Neuron Interpretation via Activation Maximization
Alex Fulleringer · Geraldin Nanfack · Jonathan Marty · Michael Eickenberg · Eugene Belilovsky

Workshop

Benchmarking of Fast and Interpretable UF Machine Learning Potentials
Pawan Prakash

Workshop

Prototype Generation: Robust Feature Visualisation for Data Independent Interpretability
Arush Tagade · Jessica Rumbelow

Workshop

Ab-DeepGA: A generative modeling framework leveraging deep learning for antibody affinity tuning
BoRam Lee · Yara Seif · Kevin Teng · Xiao Xiao · Isha Verma · Ming-Tang Chen · Alan Cheng

Workshop

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer · Olivia Watkins · Ethan Mendes · Justin Svegliato · Luke Bailey · Tiffany Wang · Isaac Ong · Karim Elmaaroufi · Pieter Abbeel · Trevor Darrell · Alan Ritter · Stuart J Russell

Main Navigation

142 Results