Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

378 Results

<<   <   Page 31 of 32   >   >>
Workshop
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal · Zongyu Lin · Tianyi Xie · Zeshun Zong · Michal Yarom · Yonatan Bitton · Chenfanfu Jiang · Yizhou Sun · Kai-Wei Chang · Aditya Grover
Workshop
Report Cards: Qualitative Evaluation of LLMs Using Natural Language Summaries
Blair Yang · Fuyang Cui · Keiran Paster · Jimmy Ba · Pashootan Vaezipoor · Silviu Pitis · Michael Zhang
Workshop
Critical human-AI use scenarios and interaction modes for societal impact evaluations
Lujain Ibrahim · Saffron Huang · Lama Ahmad · Markus Anderljung
Workshop
Sun 12:00 Legendre-SNN on Loihi-2: Evaluation and Insights
Ramashish Gaurav · Terrence Stewart · Yang Yi
Workshop
A Cautionary Tale on the Evaluation of Differentially Private In-Context Learning
Anjun Hu · Jiyang Guan · Philip Torr · Francesco Pinto
Workshop
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench
Yuan Li · Yue Huang · Yuli Lin · Siyuan Wu · Yao Wan · Lichao Sun
Workshop
Analyzing Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark · Govind Pimpale · Arjun Panickssery · Marius Hobbhahn · Jérémy Scheurer
Workshop
Measuring AI Agent Autonomy: Towards a Scalable Approach With Code Inspection
Merlin Stein · Peter Cihon · Gagan Bansal · Sam Manning
Workshop
Troubling taxonomies in GenAI evaluation
Glen Berman · Ned Cooper · Wesley Deng · Ben Hutchinson
Workshop
Towards Deliberating Agents: Evaluating the Ability of Large Language Models to Deliberate
Arjun Karanam · Farnaz Jahanbakhsh · Sanmi Koyejo
Workshop
Towards Deliberating Agents: Evaluating the Ability of Large Language Models to Deliberate
Arjun Karanam · Farnaz Jahanbakhsh · Sanmi Koyejo
Workshop
Safe and Sound: Evaluating Language Models for Bias Mitigation and Understanding
Shaina Raza · Deval Pandya · Shardul ghuge · Nifemi