Workshop
Multimodal Algorithmic Reasoning Workshop
Anoop Cherian 路 Kuan-Chuan Peng 路 Suhas Lohit 路 Honglu Zhou 路 Kevin Smith 路 Tim Marks 路 Juan Carlos Niebles 路 Petar Veli膷kovi膰
Sun 15 Dec, 8:25 a.m. PST
In this workshop, we plan to gather researchers working in neural algorithmic learning, multimodal reasoning, and cognitive models of intelligence to showcase their cutting-edge research, discuss the latest challenges, as well as bring to the forefront problems in perception and language modeling that are often overlooked but are pivotal in achieving true artificial general intelligence. An emphasis of this workshop is on the emerging topic of multimodal algorithmic reasoning, where a reasoning agent is required to automatically deduce new algorithms/procedures for solving real-world tasks, e.g., algorithms that use multimodal foundational models for analysis, synthesis, and planning, new approaches towards solving challenging vision-and-language mathematical (Olympiad type) reasoning problems, deriving winning strategies in multimodal games, procedures for using tools in robotic manipulation, etc. We hope to deep dive into this exciting topic at the intersection of multimodal learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking -- through talks from outstanding researchers and faculty that could inspire the audience to search for the missing rungs on the ladder to true intelligence.
Schedule
Sun 8:25 a.m. - 8:30 a.m.
|
Welcome
(
Introduction
)
>
SlidesLive Video |
Anoop Cherian 馃敆 |
Sun 8:30 a.m. - 9:15 a.m.
|
Keynote: Prof. Joshua B. Tenenbaum
(
Invited Talk
)
>
SlidesLive Video |
Josh Tenenbaum 馃敆 |
Sun 9:15 a.m. - 9:30 a.m.
|
Coffee Break
|
馃敆 |
Sun 9:30 a.m. - 10:15 a.m.
|
Keynote: Learning Algorithms with GNNs and Transformers
(
Invited Talk
)
>
SlidesLive Video |
Stefanie Jegelka 馃敆 |
Sun 10:15 a.m. - 10:25 a.m.
|
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
(
Oral
)
>
SlidesLive Video |
Eunice Yiu 路 Maan Qraitem 路 Charlie CJ Wong 路 Anisa N Majhi 路 Yutong Bai 路 Shiry Ginosar 路 Alison Gopnik 路 Kate Saenko 馃敆 |
Sun 10:25 a.m. - 10:35 a.m.
|
AVUA: Adaptive Video Understanding Agent Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning
(
Oral
)
>
SlidesLive Video |
Sullam Jeoung 路 Goeric Huybrechts 路 Bhavana Ganesh 路 Aram Galstyan 路 Sravan Babu Bodapati 馃敆 |
Sun 10:35 a.m. - 10:45 a.m.
|
Neural Networks for Abstraction & Reasoning
(
Oral
)
>
SlidesLive Video |
Mikel Bober-Irizar 路 Soumya Banerjee 馃敆 |
Sun 11:00 a.m. - 11:45 a.m.
|
Keynote: Prioritizing Perception in Multimodal Language Models
(
Invited Talk
)
>
SlidesLive Video |
Ranjay Krishna 馃敆 |
Sun 11:45 a.m. - 11:50 a.m.
|
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
(
Spotlight
)
>
SlidesLive Video |
13 presentersZirui Wang 路 Mengzhou Xia 路 Luxi He 路 Howard Chen 路 Yitao Liu 路 Richard Zhu 路 Kaiqu Liang 路 Xindi Wu 路 Haotian Liu 路 Sadhika Malladi 路 Alexis Chevalier 路 Sanjeev Arora 路 Danqi Chen |
Sun 11:50 a.m. - 11:55 a.m.
|
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions
(
Spotlight
)
>
link
SlidesLive Video |
Mohammadmostafa Rostamkhani 路 Baktash Ansariogholbake 路 Hoorieh Sabzevari 路 Farzan Rahmani 路 Sauleh Eetemadi 馃敆 |
Sun 11:55 a.m. - 12:00 p.m.
|
HAMMR : HierArchical MultiModal React agents for generic VQA
(
Spotlight
)
>
SlidesLive Video |
Lluis Castrejon 路 Thomas Mensink 路 Howard Zhou 路 Vittorio Ferrari 路 Andre Araujo 路 Jasper Uijlings 馃敆 |
Sun 12:00 p.m. - 12:05 p.m.
|
Are Large-Language Models Graph Algorithmic Reasoners?
(
Spotlight
)
>
SlidesLive Video |
Alexander Taylor 路 Anthony Cuturrufo 路 Vishal Yathish 路 Mingyu Derek Ma 路 Wei Wang 馃敆 |
Sun 12:05 p.m. - 12:10 p.m.
|
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
(
Spotlight
)
>
SlidesLive Video |
Sina Rismanchian 路 Yasaman Razeghi 路 Sameer Singh 路 Shayan Doroudi 馃敆 |
Sun 12:10 p.m. - 12:15 p.m.
|
ENTER: Event Based Interpretable Reasoning for VideoQA
(
Spotlight
)
>
SlidesLive Video |
11 presentersHammad Ayyubi 路 Junzhang Liu 路 Zhecan Wang 路 Hani Alomari 路 Chia-Wei Tang 路 Ali Asgarov 路 Md. Atabuzzaman 路 Najibul Haque Sarker 路 Zaber Hakim 路 Shih-Fu Chang 路 Chris Thomas |
Sun 12:15 p.m. - 1:30 p.m.
|
Lunch Break
|
馃敆 |
Sun 1:30 p.m. - 2:15 p.m.
|
Keynote: Training Robots to Think Harder
(
Invited Talk
)
>
SlidesLive Video |
Sergey Levine 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
(
Poster
)
>
|
Adriel Saporta 路 Aahlad Manas Puli 路 Mark Goldstein 路 Rajesh Ranganath 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
Smart Vision-Language Reasoners
(
Poster
)
>
|
Denisa Olteanu Roberts 路 Lucas R Roberts 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding
(
Poster
)
>
|
Shenghuan Sun 路 Alexander Schubert 路 Greg Goldgof 路 Zhiqing Sun 路 Tom Hartvigsen 路 Atul Butte 路 Ahmed Alaa 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
(
Poster
)
>
|
Rabiul Awal 路 LE ZHANG 路 Aishwarya Agrawal 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
ViLAaD: Enhancing ``Attracting and Dispersing'' Source-Free Domain Adaptation with Vision and Language Model
(
Poster
)
>
|
Shuhei Tarashima 路 XINQI SHU 路 Norio Tagawa 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
Chitrarth: Bridging Vision and Language for a Billion People
(
Poster
)
>
|
Shaharukh Khan 路 Ayush Tarun 路 Abhinav Ravi 路 Ali Faraz 路 Praveen Kumar Pokala 路 Anagha Bhangare 路 Raja Kolla 路 Chandra Khatri 路 Shubham Agarwal 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
LVM-Net: Efficient Long-Form Video Reasoning
(
Poster
)
>
|
Saket Gurukar 路 Asim Kadav 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
Vision-LLMs Can Fool Themselves with Self-Generated Text
(
Poster
)
>
|
Maan Qraitem 路 Nazia Tasnim 路 Piotr Teterwak 路 Kate Saenko 路 Bryan Plummer 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
LLAVIDAL: Benchmarking Large LAnguage VIsion Models for Daily Activities of Living
(
Poster
)
>
|
Rajatsubhra Chakraborty 路 Arkaprava Sinha 路 Dominick Reilly 路 Manish Kumar Govind 路 Pu Wang 路 francois bremond 路 Srijan Das 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
AVUA: Adaptive Video Understanding Agent Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning
(
Poster
)
>
|
Sullam Jeoung 路 Goeric Huybrechts 路 Bhavana Ganesh 路 Aram Galstyan 路 Sravan Babu Bodapati 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
(
Poster
)
>
|
Eunice Yiu 路 Maan Qraitem 路 Charlie CJ Wong 路 Anisa N Majhi 路 Yutong Bai 路 Shiry Ginosar 路 Alison Gopnik 路 Kate Saenko 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
Neural Networks for Abstraction & Reasoning
(
Poster
)
>
|
Mikel Bober-Irizar 路 Soumya Banerjee 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
(
Poster
)
>
|
13 presentersZirui Wang 路 Mengzhou Xia 路 Luxi He 路 Howard Chen 路 Yitao Liu 路 Richard Zhu 路 Kaiqu Liang 路 Xindi Wu 路 Haotian Liu 路 Sadhika Malladi 路 Alexis Chevalier 路 Sanjeev Arora 路 Danqi Chen |
Sun 2:15 p.m. - 4:15 p.m.
|
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions
(
Poster
)
>
|
Mohammadmostafa Rostamkhani 路 Baktash Ansariogholbake 路 Hoorieh Sabzevari 路 Farzan Rahmani 路 Sauleh Eetemadi 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
HAMMR : HierArchical MultiModal React agents for generic VQA
(
Poster
)
>
|
Lluis Castrejon 路 Thomas Mensink 路 Howard Zhou 路 Vittorio Ferrari 路 Andre Araujo 路 Jasper Uijlings 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
Are Large-Language Models Graph Algorithmic Reasoners?
(
Poster
)
>
|
Alexander Taylor 路 Anthony Cuturrufo 路 Vishal Yathish 路 Mingyu Derek Ma 路 Wei Wang 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
(
Poster
)
>
|
Sina Rismanchian 路 Yasaman Razeghi 路 Sameer Singh 路 Shayan Doroudi 馃敆 |
Sun 2:15 p.m. - 4:15 p.m.
|
ENTER: Event Based Interpretable Reasoning for VideoQA
(
Poster
)
>
|
11 presentersHammad Ayyubi 路 Junzhang Liu 路 Zhecan Wang 路 Hani Alomari 路 Chia-Wei Tang 路 Ali Asgarov 路 Md. Atabuzzaman 路 Najibul Haque Sarker 路 Zaber Hakim 路 Shih-Fu Chang 路 Chris Thomas |
Sun 4:15 p.m. - 5:00 p.m.
|
Keynote: LLM Posteriors over Functions as a New Output Modality
(
Invited Talk
)
>
SlidesLive Video |
David Duvenaud 馃敆 |
Sun 5:00 p.m. - 5:05 p.m.
|
Closing Remarks
(
Closing Remarks
)
>
SlidesLive Video |
Anoop Cherian 馃敆 |