NeurIPS 2024

Workshop

Interpretable AI in Human-Machine Systems: Insights from Human-in-the-Loop Product Recommendation Engines
Pooria Assadi · NIMA SAFAEI

Workshop

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations
Kola Ayonrinde · Michael Pearce

Workshop

Sat 15:45

Reexpress: Similarity-Distance-Magnitude Calibration
Allen Schmaltz

Workshop

Sat 15:45

Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability
Lukas Klein · Kenza Amara · Carsten Lüth · Hendrik Strobelt · Mennatallah El-Assady · Paul Jaeger

Workshop

Sat 15:45

Bayesian Concept Bottleneck Models with LLM Priors
Jean Feng · Avni Kothari · Lucas Zier · Chandan Singh · Yan Shuo Tan

Workshop

Decomposing and Editing Predictions by Modeling Model Computation
Harshay Shah · Andrew Ilyas · Aleksander Madry

Workshop

NODE-GAMLSS: Interpretable Uncertainty Modelling via Deep Distributional Regression
Ananyapam De · Anton Thielmann · Benjamin Säfken

Workshop

Scalable and interpretable quantum natural language processing: an implementation on trapped ions
Tiffany Duneau · Saskia Bruhn · Gabriel Matos · Tuomas Laakkonen · Katerina Saiti · Anna Pearson · Konstantinos Meichanetzidis · Bob Coecke

Workshop

Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs
Jiatong Han · Jannik Kossen · Muhammed Razzak · Yarin Gal

Workshop

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations
Kola Ayonrinde · Michael Pearce · Lee Sharkey

Workshop

SPRINT Enables Interpretable and Ultra-Fast Virtual Screening against Thousands of Proteomes
Andrew McNutt · Abhinav Adduri · Caleb Ellington · Monica Dayao · Eric Xing · Hosein Mohimani · David Koes

Workshop

An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki · Boyi Wei · Yangsibo Huang · Peter Henderson · Florian Tramer · Javier Rando

Main Navigation