Poster
|
|
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Yutao Mou · Shikun Zhang · Wei Ye
|
|
Poster
|
|
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao · Xiaoyuan Yi · Xing Xie
|
|
Poster
|
Thu 11:00
|
ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence
Kevin Wu · Eric Wu · James Zou
|
|
Poster
|
Wed 11:00
|
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios
Junchao Wu · Runzhe Zhan · Derek Wong · Shu Yang · Xinyi Yang · Yulin Yuan · Lidia Chao
|
|
Workshop
|
|
MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs
Saeid Asgari · Aliasghar Khani · Amir Khasahmadi
|
|
Poster
|
Thu 11:00
|
Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models
Matthew Zheng · Enis Simsar · Hidir Yesiltepe · Federico Tombari · Joel Simon · Pinar Yanardag Delul
|
|
Poster
|
Fri 11:00
|
SeafloorAI: A Large-scale Vision-Language Dataset for Seafloor Geological Survey
Kien Nguyen · Fengchun Qiao · Arthur Trembanis · Xi Peng
|
|
Affinity Event
|
|
ORIN: The Nigerian music benchmark dataset for Music Information Retrieval task
Sakinat Folorunso
|
|
Poster
|
Wed 16:30
|
FindingEmo: An Image Dataset for Emotion Recognition in the Wild
Laurent Mertens · Elahe Yargholi · Hans Op de Beeck · Jan Van den Stock · Joost Vennekens
|
|
Poster
|
Wed 11:00
|
GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps
Muhammad Umair Nasir · Steven James · Julian Togelius
|
|
Poster
|
Fri 11:00
|
PersonalSum: A User-Subjective Guided Personalized Summarization Dataset for Large Language Models
Lemei Zhang · Peng Liu · Marcus Henriksboe · Even Lauvrak · Jon Atle Gulla · Heri Ramampiaro
|
|
Poster
|
Thu 16:30
|
Value Imprint: A Technique for Auditing the Human Values Embedded in RLHF Datasets
Ike Obi · Rohan Pant · Srishti Shekhar Agrawal · Maham Ghazanfar · Aaron Basiletti
|
|