firstbacksecondback
530 Results
Poster
|
NN4SysBench: Characterizing Neural Network Verification for Computer Systems Shuyi Lin · Haoyu He · Tianhao WEI · Kaidi Xu · Huan Zhang · Gagandeep Singh · Changliu Liu · Cheng Tan |
||
Poster
|
Thu 11:00 |
TaskBench: Benchmarking Large Language Models for Task Automation Yongliang Shen · Kaitao Song · Xu Tan · Wenqi Zhang · Kan Ren · Siyu Yuan · Weiming Lu · Dongsheng Li · Yueting Zhuang |
|
Poster
|
MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation jialin luo · Yuanzhi Wang · Ziqi Gu · Yide Qiu · Shuaizhen Yao · Fuyun Wang · Chunyan Xu · Wenhua Zhang · Dan Wang · Zhen Cui |
||
Poster
|
Wed 11:00 |
A Cross-Domain Benchmark for Active Learning Thorben Werner · Johannes Burchert · Maximilian Stubbemann · Lars Schmidt-Thieme |
|
Poster
|
Fri 16:30 |
HARMONIC: Harnessing LLMs for Tabular Data Synthesis and Privacy Protection Yuxin Wang · Duanyu Feng · Yongfu Dai · Zhengyu Chen · Jimin Huang · Sophia Ananiadou · Qianqian Xie · Hao Wang |
|
Poster
|
Wed 16:30 |
MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations Yubo Ma · Yuhang Zang · Liangyu Chen · Meiqi Chen · Yizhu Jiao · Xinze Li · Xinyuan Lu · Ziyu Liu · Yan Ma · Xiaoyi Dong · Pan Zhang · Liangming Pan · Yu-Gang Jiang · Jiaqi Wang · Yixin Cao · Aixin Sun |
|
Poster
|
Fri 11:00 |
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Patrick Chao · Edoardo Debenedetti · Alexander Robey · Maksym Andriushchenko · Francesco Croce · Vikash Sehwag · Edgar Dobriban · Nicolas Flammarion · George J. Pappas · Florian Tramer · Hamed Hassani · Eric Wong |
|
Poster
|
Wed 16:30 |
WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control Claire Bizon Monroc · Ana Busic · Donatien Dubuc · Jiamin Zhu |
|
Poster
|
Wed 16:30 |
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents Cheng-Kuang Wu · Zhi Rui Tam · Chieh-Yen Lin · Yun-Nung (Vivian) Chen · Hung-yi Lee |
|
Poster
|
Thu 11:00 |
WikiDBs: A Large-Scale Corpus Of Relational Databases From Wikidata Liane Vogel · Jan-Micha Bodensohn · Carsten Binnig |
|
Poster
|
Wed 16:30 |
FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection Jiaqi Wang · Xiaochen Wang · Lingjuan Lyu · Jinghui Chen · Fenglong Ma |
|
Poster
|
Fri 16:30 |
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition Edoardo Debenedetti · Javier Rando · Daniel Paleka · Silaghi Florin · Dragos Albastroiu · Niv Cohen · Yuval Lemberg · Reshmi Ghosh · Rui Wen · Ahmed Salem · Giovanni Cherubin · Santiago Zanella-Beguelin · Robin Schmid · Victor Klemm · Takahiro Miki · Chenhao Li · Stefan Kraft · Mario Fritz · Florian Tramer · Sahar Abdelnabi · Lea Schönherr |