NeurIPS 2024

Affinity Event

Benchmark on Peer Review Toxic Detection: A Challenging Task with a New Dataset
Man Luo · Bradley Peterson · Rafael Gan · Hari Ramalingame · Navya Gangrade · Ariadne Dimarogona · Imon Banerjee · Phillip Howard

Poster

Wed 16:30

BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented text
Siyan Wang · Bradford Levy

Poster

Fri 11:00

UniTox: Leveraging LLMs to Curate a Unified Dataset of Drug-Induced Toxicity from FDA Labels
Jacob Silberg · Kyle Swanson · Elana Simon · Angela Zhang · Zaniar Ghazizadeh · Scott Ogden · Hisham Hamadeh · James Zou

Poster

Thu 16:30

Toxicity Detection for Free
Zhanhao Hu · Julien Piet · Geng Zhao · Jiantao Jiao · David Wagner

Poster

Fri 16:30

Soft-Label Integration for Robust Toxicity Classification
Zelei Cheng · Xian Wu · Jiahao Yu · Shuo Han · Xin-Qiang Cai · Xinyu Xing

Workshop

The effect of fine-tuning on language model toxicity
Will Hawkins · Brent Mittelstadt · Chris Russell

Workshop

The effect of fine-tuning on language model toxicity
Will Hawkins · Brent Mittelstadt · Chris Russell

Workshop

Ablation is Not Enough to Emulate DPO: A Mechanistic Analysis of Toxicity Reduction
Yushi Yang · Filip Sondej · Harry Mayne · Adam Mahdi

Workshop

Ablation is Not Enough to Emulate DPO: Attributing Toxicity Reduction to Neurons
Yushi Yang · Filip Sondej · Harry Mayne · Adam Mahdi

Workshop

Ablation is Not Enough to Emulate DPO: A Mechanistic Analysis of Toxicity Reduction
Yushi Yang · Filip Sondej · Harry Mayne · Adam Mahdi

Workshop

Sat 15:30

Keynote 4: TextAttack for Improving Toxicity Detectors’ Adversarial Robustness
Yanjun Qi · Yanjun Qi

Workshop

Sun 14:10

The effect of fine-tuning on language model toxicity

Main Navigation