firstbacksecondback
146 Results
Workshop
|
Sat 11:50 |
Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark Vitali Petsiuk · Alexander E. Siemenn · Saisamrit Surbehera · Qi Qi Chin · Keith Tyser · Gregory Hunter · Arvind Raghavan · Yann Hicke · Bryan Plummer · Ori Kerret · Tonio Buonassisi · Kate Saenko · Armando Solar-Lezama · Iddo Drori |
|
Workshop
|
Train Offline, Test Online: A Real Robot Learning Benchmark Gaoyue Zhou · Victoria Dean · Mohan Kumar Srirama · Aravind Rajeswaran · Jyothish Pari · Kyle Hatch · Aryan Jain · Tianhe Yu · Pieter Abbeel · Lerrel Pinto · Chelsea Finn · Abhinav Gupta |
||
Workshop
|
Benchmarking Counterfactual Reasoning Abilities about Implicit Physical Properties Maitreya Patel · Tejas Gokhale · Chitta Baral · 'YZ' Yezhou Yang |
||
Workshop
|
ℓGym: Natural Language Visual Reasoning with Reinforcement Learning Anne Wu · Kianté Brantley · Noriyuki Kojima · Yoav Artzi |
||
Workshop
|
SCERL: A Benchmark for intersecting language and safe reinforcement learning Lan Hoang · Shivam Ratnakar · Nicolas Galichet · Akifumi Wachi · Keerthiram Murugesan · Songtao Lu · Mattia Atzeni · Michael Katz · Subhajit Chaudhury |
||
Workshop
|
ProofNet: A Benchmark for Autoformalizing and Formally Proving Undergraduate-Level Mathematics Problems Zhangir Azerbayev · Bartosz Piotrowski · Jeremy Avigad |
||
Workshop
|
A Federated Learning benchmark for Drug-Target Interaction Filip Svoboda · Gianluca Mittone · Nicholas Lane · Pietro Lió |
||
Workshop
|
A Control-Centric Benchmark for Video Prediction Stephen Tian · Chelsea Finn · Jiajun Wu |
||
Workshop
|
Reliability benchmarks for image segmentation Estefany Kelly Buchanan · Michael Dusenberry · Jie Ren · Kevin Murphy · Balaji Lakshminarayanan · Dustin Tran |
||
Workshop
|
A Synthetic Limit Order Book Dataset for Benchmarking Forecasting Algorithms under Distributional Shift Defu Cao · Yousef El-Laham · Loc Trinh · Svitlana Vyetrenko · Yan Liu |
||
Workshop
|
Recommendations for Baselines and Benchmarking Approximate Gaussian Processes Sebastian Ober · David Burt · Artem Artemev · Mark van der Wilk |
||
Workshop
|
Benchmarking Robustness under Distribution Shift of Multimodal Image-Text Models Jielin Qiu · Yi Zhu · Xingjian Shi · Zhiqiang Tang · DING ZHAO · Bo Li · Mu Li |