firstbacksecondback
142 Results
Workshop
|
Successor Heads: Recurring, Interpretable Attention Heads In The Wild Rhys Gould · Euan Ong · George Ogden · Arthur Conmy |
||
Workshop
|
Incorporating Additive Separability into Hamiltonian Neural Networks for Regression and Interpretation Zi-Yu Khoo · Jonathan Sze Choong Low · Stéphane Bressan |
||
Workshop
|
Single-cell Masked Autoencoder: An Accurate and Interpretable Automated Immunophenotyper Jaesik Kim · Matei Ionita · Matthew Lee · Michelle McKeague · Ajinkya Pattekar · Mark Painter · Joost Wagenaar · Van Q. Truong · Dylan Norton · Divij Mathew · Yonghyun Nam · Sokratis Apostolidis · Patryk Orzechowski · Sang-Hyuk Jung · Jakob Woerner · Yidi Huang · Nuala Meyer · Allison Greenplate · Dokyoon Kim · John Wherry |
||
Workshop
|
Sat 14:07 |
Scale Alone Does not Improve Mechanistic Interpretability in Vision Models Roland S. Zimmermann · Thomas Klein · Wieland Brendel |
|
Workshop
|
Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model Yida Chen · Fernanda Viégas · Martin Wattenberg |
||
Workshop
|
Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT Zechen Zhang · Dean Hazineh · Jeffrey Chiu |
||
Workshop
|
InterpreTabNet: Enhancing Interpretability of Tabular Data Using Deep Generative Models and Large Language Models Jacob Yoke Hong Si · Rahul Krishnan · Michael Cooper · Wendy Yusi Cheng |
||
Workshop
|
Adversarial Attacks on Neuron Interpretation via Activation Maximization Alex Fulleringer · Geraldin Nanfack · Jonathan Marty · Michael Eickenberg · Eugene Belilovsky |
||
Workshop
|
Benchmarking of Fast and Interpretable UF Machine Learning Potentials Pawan Prakash |
||
Workshop
|
Prototype Generation: Robust Feature Visualisation for Data Independent Interpretability Arush Tagade · Jessica Rumbelow |
||
Workshop
|
Ab-DeepGA: A generative modeling framework leveraging deep learning for antibody affinity tuning BoRam Lee · Yara Seif · Kevin Teng · Xiao Xiao · Isha Verma · Ming-Tang Chen · Alan Cheng |
||
Workshop
|
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer · Olivia Watkins · Ethan Mendes · Justin Svegliato · Luke Bailey · Tiffany Wang · Isaac Ong · Karim Elmaaroufi · Pieter Abbeel · Trevor Darrell · Alan Ritter · Stuart J Russell |