Skip to yearly menu bar Skip to main content


Search All 2023 Events
 

142 Results

<<   <   Page 3 of 12   >   >>
Workshop
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
Rhys Gould · Euan Ong · George Ogden · Arthur Conmy
Workshop
Incorporating Additive Separability into Hamiltonian Neural Networks for Regression and Interpretation
Zi-Yu Khoo · Jonathan Sze Choong Low · Stéphane Bressan
Workshop
Single-cell Masked Autoencoder: An Accurate and Interpretable Automated Immunophenotyper
Jaesik Kim · Matei Ionita · Matthew Lee · Michelle McKeague · Ajinkya Pattekar · Mark Painter · Joost Wagenaar · Van Q. Truong · Dylan Norton · Divij Mathew · Yonghyun Nam · Sokratis Apostolidis · Patryk Orzechowski · Sang-Hyuk Jung · Jakob Woerner · Yidi Huang · Nuala Meyer · Allison Greenplate · Dokyoon Kim · John Wherry
Workshop
Sat 14:07 Scale Alone Does not Improve Mechanistic Interpretability in Vision Models
Roland S. Zimmermann · Thomas Klein · Wieland Brendel
Workshop
Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Yida Chen · Fernanda Viégas · Martin Wattenberg
Workshop
Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT
Zechen Zhang · Dean Hazineh · Jeffrey Chiu
Workshop
InterpreTabNet: Enhancing Interpretability of Tabular Data Using Deep Generative Models and Large Language Models
Jacob Yoke Hong Si · Rahul Krishnan · Michael Cooper · Wendy Yusi Cheng
Workshop
Adversarial Attacks on Neuron Interpretation via Activation Maximization
Alex Fulleringer · Geraldin Nanfack · Jonathan Marty · Michael Eickenberg · Eugene Belilovsky
Workshop
Benchmarking of Fast and Interpretable UF Machine Learning Potentials
Pawan Prakash
Workshop
Prototype Generation: Robust Feature Visualisation for Data Independent Interpretability
Arush Tagade · Jessica Rumbelow
Workshop
Ab-DeepGA: A generative modeling framework leveraging deep learning for antibody affinity tuning
BoRam Lee · Yara Seif · Kevin Teng · Xiao Xiao · Isha Verma · Ming-Tang Chen · Alan Cheng
Workshop
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer · Olivia Watkins · Ethan Mendes · Justin Svegliato · Luke Bailey · Tiffany Wang · Isaac Ong · Karim Elmaaroufi · Pieter Abbeel · Trevor Darrell · Alan Ritter · Stuart J Russell