firstbacksecondback
39 Results
Workshop
|
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding Haneul Yoo · Yongjin Yang · Hwaran Lee |
||
Workshop
|
SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming Anurakt Kumar · Divyanshu Kumar · Jatan Loya · Nitin Aravind Birur · Tanay Baswa · Sahil Agarwal · Prashanth Harshangi |
||
Workshop
|
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation Jonathan Cook · Tim Rocktäschel · Jakob Foerster · Dennis Aumiller · Alex Wang |
||
Workshop
|
Sat 15:45 |
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation Siyuan Wang · Zhuohan Long · Zhihao Fan · Xuanjing Huang · zhongyu wei |
|
Workshop
|
Principles of Animal Cognition for LLM Evaluations: A Case Study on Transitive Inference Sunayana Rane · Cyrus Kirkman · Amanda Royka · Graham Todd · Ryan Law · Jacob Foster · Erica Cartmill |
||
Workshop
|
Evaluating Language Models Planning Capabilities on Goal Ordering Challenges Eran Hirsch · Guy Uziel · Ateret Anaby Tavor |
||
Workshop
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y |
||
Workshop
|
MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs Saeid Asgari · Aliasghar Khani · Amir Khasahmadi |
||
Workshop
|
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents Anthony Costarelli · Mat Allen · Roman Hauksson · Grace Sodunke · Suhas Hariharan · Carlson Cheng · Wenjie Li · Joshua Clymer · Arjun Yadav |
||
Workshop
|
Sat 12:00 |
Skilling laws: scaling laws for LLM benchmark performance Felipe Maia Polo · Seamus Somerstep · Leshem Choshen · Yuekai Sun · Mikhail Yurochkin |
|
Workshop
|
Sat 15:45 |
Towards LLM-guided Efficient and Interpretable Multi-linear Tensor Network Rank Selection Giorgos Iacovides · Wuyang Zhou · Danilo Mandic |
|
Workshop
|
Report Cards: Qualitative Evaluation of LLMs Using Natural Language Summaries Blair Yang · Fuyang Cui · Keiran Paster · Jimmy Ba · Pashootan Vaezipoor · Silviu Pitis · Michael Zhang |