Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

16 Results

<<   <   Page 1 of 2   >   >>
Workshop
Evaluating Language Models Planning Capabilities on Goal Ordering Challenges
Eran Hirsch · Guy Uziel · Ateret Anaby Tavor
Workshop
Analyzing Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark · Govind Pimpale · Arjun Panickssery · Marius Hobbhahn · Jérémy Scheurer
Workshop
AI Sandbagging: Language Models can Selectively Underperform on Evaluations
Teun van der Weij · Felix Hofstätter · Oliver Jaffe · Samuel Brown · Francis Ward
Poster
Fri 11:00 InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models
Linyi Li · Shijie Geng · Zhenwen Li · Yibo He · Hao Yu · Ziyue Hua · Guanghan Ning · Siwei Wang · Tao Xie · Hongxia Yang
Workshop
Sandbag Detection through Model Impairment
Cameron Tice · Philipp Kreer · Nathan Helm-Burger · Prithviraj Singh Shahani · Fedor Ryzhenkov · Teun van der Weij · Felix Hofstätter · Jacob Haimes
Workshop
CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models
ZEYU WANG
Workshop
CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models
ZEYU WANG
Workshop
The Elicitation Game: Stress-Testing Capability Elicitation Techniques
Felix Hofstätter · Jayden Teoh · Teun van der Weij · Francis Ward
Workshop
CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models
ZEYU WANG
Poster
Wed 16:30 ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models
Shuo Liu · Kaining Ying · Hao Zhang · yue yang · Yuqi Lin · Tianle Zhang · Chuanhao Li · Yu Qiao · Ping Luo · Wenqi Shao · Kaipeng Zhang
Workshop
Evaluating Interventional Reasoning Capabilities of Large Language Models
Tejas Kasetty · Divyat Mahajan · Gintare Karolina Dziugaite · Alexandre Drouin · Dhanya Sridhar
Workshop
Dimensions of Generative AI Evaluation Design
Alex Dow · Jennifer Wortman Vaughan · Solon Barocas · Chad Atalla · Alexandra Chouldechova · Hanna Wallach