firstbacksecondback
11 Results
Workshop
|
From Correlation to Causation: Understanding Climate Change through ML and LLM Inquiries Shan Shan |
||
Workshop
|
Sat 15:45 |
Towards LLM-guided Efficient and Interpretable Multi-linear Tensor Network Rank Selection Giorgos Iacovides · Wuyang Zhou · Danilo Mandic |
|
Workshop
|
INTERPRETABILITY OF LLM DECEPTION: UNIVERSAL MOTIF Wannan Yang · Chen Sun · Gyorgy Buzsaki |
||
Poster
|
Wed 11:00 |
LLM Circuit Analyses Are Consistent Across Training and Scale Curt Tigges · Michael Hanna · Qinan Yu · Stella Biderman |
|
Workshop
|
Extracting Paragraphs from LLM Token Activations Nicky Pochinkov · Angelo Benoit · Lovkush Agarwal · Zainab Ali Majid · Lucile Ter-Minassian |
||
Workshop
|
Sat 15:45 |
Reexpress: Similarity-Distance-Magnitude Calibration Allen Schmaltz |
|
Poster
|
Thu 16:30 |
Transcoders find interpretable LLM feature circuits Jacob Dunefsky · Philippe Chlenski · Neel Nanda |
|
Workshop
|
LoFiT: Localized Fine-tuning on LLM Representations Fangcong Yin · Xi Ye · Greg Durrett |
||
Workshop
|
Sat 15:45 |
Bayesian Concept Bottleneck Models with LLM Priors Jean Feng · Avni Kothari · Lucas Zier · Chandan Singh · Yan Shuo Tan |
|
Workshop
|
Uncovering Uncertainty in Transformer Inference Greyson Brothers · Willa Mannering · John Winder · Amber Tien |
||
Workshop
|
HarmAnalyst: Interpretable, transparent, and steerable LLM safety moderation Jing-Jing Li · Valentina Pyatkin · Max Kleiman-Weiner · Liwei Jiang · Nouha Dziri · Anne Collins · Jana Schaich Borg · Maarten Sap · Yejin Choi · Sydney Levine |