Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

39 Results

<<   <   Page 2 of 4   >   >>
Workshop
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
Haneul Yoo · Yongjin Yang · Hwaran Lee
Workshop
SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Anurakt Kumar · Divyanshu Kumar · Jatan Loya · Nitin Aravind Birur · Tanay Baswa · Sahil Agarwal · Prashanth Harshangi
Workshop
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation
Jonathan Cook · Tim Rocktäschel · Jakob Foerster · Dennis Aumiller · Alex Wang
Workshop
Sat 15:45 Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
Siyuan Wang · Zhuohan Long · Zhihao Fan · Xuanjing Huang · zhongyu wei
Workshop
Principles of Animal Cognition for LLM Evaluations: A Case Study on Transitive Inference
Sunayana Rane · Cyrus Kirkman · Amanda Royka · Graham Todd · Ryan Law · Jacob Foster · Erica Cartmill
Workshop
Evaluating Language Models Planning Capabilities on Goal Ordering Challenges
Eran Hirsch · Guy Uziel · Ateret Anaby Tavor
Workshop
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents
Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y
Workshop
MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs
Saeid Asgari · Aliasghar Khani · Amir Khasahmadi
Workshop
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
Anthony Costarelli · Mat Allen · Roman Hauksson · Grace Sodunke · Suhas Hariharan · Carlson Cheng · Wenjie Li · Joshua Clymer · Arjun Yadav
Workshop
Sat 12:00 Skilling laws: scaling laws for LLM benchmark performance
Felipe Maia Polo · Seamus Somerstep · Leshem Choshen · Yuekai Sun · Mikhail Yurochkin
Workshop
Sat 15:45 Towards LLM-guided Efficient and Interpretable Multi-linear Tensor Network Rank Selection
Giorgos Iacovides · Wuyang Zhou · Danilo Mandic
Workshop
Report Cards: Qualitative Evaluation of LLMs Using Natural Language Summaries
Blair Yang · Fuyang Cui · Keiran Paster · Jimmy Ba · Pashootan Vaezipoor · Silviu Pitis · Michael Zhang