firstbacksecondback
31 Results
Poster
|
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval Chengzhi Lin · Ancong Wu · Junwei Liang · Jun Zhang · Wenhang Ge · Wei-Shi Zheng · Chunhua Shen |
||
Poster
|
Wed 9:00 |
Robustness Analysis of Video-Language Models Against Visual and Language Perturbations Madeline Chantry · Shruti Vyas · Hamid Palangi · Yogesh Rawat · Vibhav Vineet |
|
Poster
|
LGDN: Language-Guided Denoising Network for Video-Language Modeling Haoyu Lu · Mingyu Ding · Nanyi Fei · Yuqi Huo · Zhiwu Lu |
||
Poster
|
Thu 14:00 |
Video Diffusion Models Jonathan Ho · Tim Salimans · Alexey Gritsenko · William Chan · Mohammad Norouzi · David Fleet |
|
Poster
|
Towards Video Text Visual Question Answering: Benchmark and Baseline Minyi Zhao · Bingjia Li · Jie Wang · Wanqing Li · Wenjing Zhou · Lan Zhang · Shijie Xuyang · Zhihang Yu · Xinkun Yu · Guangze Li · Aobotao Dai · Shuigeng Zhou |
||
Poster
|
Thu 9:00 |
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis Jian Liang · Chenfei Wu · Xiaowei Hu · Zhe Gan · Jianfeng Wang · Lijuan Wang · Zicheng Liu · Yuejian Fang · Nan Duan |
|
Poster
|
Audio-Driven Co-Speech Gesture Video Generation Xian Liu · Qianyi Wu · Hang Zhou · Yuanqi Du · Wayne Wu · Dahua Lin · Ziwei Liu |
||
Poster
|
Tue 14:00 |
Learning State-Aware Visual Representations from Audible Interactions Himangi Mittal · Pedro Morgado · Unnat Jain · Abhinav Gupta |
|
Poster
|
Tue 14:00 |
Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing Shentong Mo · Yapeng Tian |
|
Poster
|
Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding Yang Jin · yongzhi li · Zehuan Yuan · Yadong Mu |
||
Poster
|
Tue 14:00 |
Grounded Video Situation Recognition Zeeshan Khan · C.V. Jawahar · Makarand Tapaswi |
|
Poster
|
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering Yuanze Lin · Yujia Xie · Dongdong Chen · Yichong Xu · Chenguang Zhu · Lu Yuan |