Timezone: »
Computer Vision is a mature field with long history of academic research, but recent advances in deep learning provided machine learning models with new capabilities to understand visual content. There have been tremendous improvements on problems like classification, detection, segmentation, which are basic proxies for the ability of a model to understand the visual content. These are accompanied by a steep rise of Computer Vision adoption in industry at scale, and by more complex tasks such as Image Captioning and Visual Q&A. These go well beyond the classical problems and open the doors to a whole new world of possibilities. As industrial applications mature, the challenges slowly shift towards challenges in data, in scale, and in moving from purely visual data to multi-modal data.
The unprecedented adoption of Computer Vision to numerous real world applications processing billions of "live" media content daily, raises a new set of challenges, including:
1. Efficient Data Collection (Smart sampling, weak annotations, ...)
2. Evaluating performance in the wild (long tails, embarrassing mistakes, calibration)
3. Incremental learning: Evolve systems incrementally in complex environments (new data, new categories, federated architectures ...)
4. Handling tradeoffs: Computation vs Accuracy vs Supervision
5. Outputs are various types (Binary predictions, embeddings etc.)
6. Machine learning feedback loops
7. Minimizing technical debt as system matures
8. On-device vs On-cloud vs Split
9. Multi-modal content understanding
We will bring together researchers and practitioners who are interested to address this new set of challenges and provide a venue to share how industry and academia approach these problems. We will invite prominent speakers from academia and industry to give their perspectives on these challenges. In addition, we will have 5 minute spotlights for selected papers submitted to the workshop and a poster session for all selected submissions. The topics of submissions should be related to the above mentioned list of challenges. We will end the session with a panel discussion including the speakers on the future of large scale vision and its applications in the wild.
In the second part we will looke at how specifically this applies to video understanding. Video understanding aims at developing computer methods that can interpret videos at different semantic levels. Applications include video categorization, event detection, semantic segmentation, description, summarization, tagging, content-based retrieval, surveillance, and many more. Although in the last two decades the field of video analytics has witnessed significant progress, most problems in this area still remain largely unsolved. In recent years video understanding has become an even more critical and timely problem to address because of the tremendous growth of videos on the Internet, most of which do not contain tags or descriptions and thus necessitate automatic analysis to become searchable or browsable. At the same time the rise of online video repositories represents an opportunity for the creation of new pivotal large-scale datasets for research in this area. Given the recent breakthroughs achieved by deep learning in other big data domains, we believe that video understanding may very well be on the verge of a technical revolution that will spur significant advances in this area.
In order to foster further progress by the research community, we propose to organize a one-day workshop to discuss emerging innovations and ideas about the problems and challenges related to video understanding. The workshop will consist of a series of invited talks from researchers in this area. In addition, we will publicly announce and present a new large-scale benchmark for video comprehension [1] that has the potential to become an instrumental resource for future research in this field. Compared to existing video datasets, our proposed benchmark has much bigger scale and it casts video understanding in the novel form of multiple choice tests that assess the ability of the algorithm to comprehend the semantics of the video.
This workshop will be the first of a series of annual meetings that we will organize to stimulate steady progress in this area. In each subsequent edition of this workshop, we will host an annual challenge on our continuously expanding video comprehension benchmark in order to motivate students and researchers to push the envelope on this problem. We hope to bring together researchers with common interests in video analysis to share, learn, and make good progress toward better video understanding methods.
[1] D. Tran, M. Paluri, and L. Torresani, “ViCom: Benchmark and Methods for Video Comprehension,” CoRR, abs/1606.07373, July 2016,
http://arxiv.org/abs/1606.07373
Sat 12:00 a.m. - 12:10 a.m.
|
Introduction
(
Talk
)
|
Lorenzo Torresani 🔗 |
Sat 12:10 a.m. - 12:40 a.m.
|
Invited Talk - Learning to see objects by listening
(
Talk
)
|
Antonio Torralba 🔗 |
Sat 12:40 a.m. - 1:10 a.m.
|
Invited Talk - Recent Progress in Spatio-Temporal Action Location
(
Talk
)
|
Cordelia Schmid 🔗 |
Sat 1:10 a.m. - 1:30 a.m.
|
CV @ Scale Challenges
(
Talk
)
|
Manohar Paluri · Gal Chechik 🔗 |
Sat 2:00 a.m. - 2:30 a.m.
|
ViCom: Benchmark and Methods for Video Comprehension
(
Talk
)
|
Du Tran · Maksim Bolonkin · Manohar Paluri · Lorenzo Torresani 🔗 |
Sat 2:30 a.m. - 2:45 a.m.
|
Knowledge Acquisition for Visual Question Answering via Iterative Querying
(
Spotlight
)
|
Yuke Zhu · Joseph Lim · Li Fei-Fei 🔗 |
Sat 2:45 a.m. - 3:00 a.m.
|
Tag Prediction at Flickr: a View from the Darkroom
(
Spotlight
)
|
Kofi A Boakye 🔗 |
Sat 5:00 a.m. - 5:30 a.m.
|
Invited Talk - TorontoCity Benchmark: Towards Building Large Scale 3D Models of the World
(
Talk
)
|
Raquel Urtasun 🔗 |
Sat 5:30 a.m. - 5:45 a.m.
|
What makes ImageNet good for Transfer Learning?
(
Spotlight
)
|
Jacob MY Huh · Pulkit Agrawal · Alexei Efros 🔗 |
Sat 5:45 a.m. - 6:00 a.m.
|
PororoQA: Cartoon Video Series Dataset for Story Understanding
(
Spotlight
)
|
KyungMin Kim · Min-Oh Heo · Byoung-Tak Zhang 🔗 |
Sat 6:00 a.m. - 7:00 a.m.
|
Poster Presentations
(
Poster Session
)
|
🔗 |
Sat 7:00 a.m. - 7:30 a.m.
|
Invited Talk - Self Supervised Learning of Visual Representations
(
Talk
)
|
Abhinav Gupta 🔗 |
Sat 7:30 a.m. - 8:00 a.m.
|
Invited Talk - Scaling-up: Image Super-resolution and Compression for the masses
(
Talk
)
|
Zehan Wang 🔗 |
Author Information
Manohar Paluri (Facebook)
Lorenzo Torresani (Dartmouth/Facebook)
Lorenzo Torresani is an Associate Professor with tenure in the Computer Science Department at Dartmouth College and a Research Scientist at Facebook AI. He received a Laurea Degree in Computer Science with summa cum laude honors from the University of Milan (Italy) in 1996, and an M.S. and a Ph.D. in Computer Science from Stanford University in 2001 and 2005, respectively. In the past, he has worked at several industrial research labs including Microsoft Research Cambridge, Like.com and Digital Persona. His research interests are in computer vision and deep learning. He is the recipient of several awards, including a CVPR best student paper prize, a National Science Foundation CAREER Award, a Google Faculty Research Award, three Facebook Faculty Awards, and a Fulbright U.S. Scholar Award.
Gal Chechik (Google, BIU)
Dario Garcia (Facebook)
Du Tran (Facebook)
More from the Same Authors
-
2021 : Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning »
Guy Tennenholtz · Assaf Hallak · Gal Dalal · Shie Mannor · Gal Chechik · Uri Shalit -
2022 : Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs »
Benjamin Fuhrer · Yuval Shpigelman · Chen Tessler · Shie Mannor · Gal Chechik · Eitan Zahavi · Gal Dalal -
2022 : SoftTreeMax: Policy Gradient with Tree Search »
Gal Dalal · Assaf Hallak · Shie Mannor · Gal Chechik -
2022 : Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs »
Benjamin Fuhrer · Yuval Shpigelman · Chen Tessler · Shie Mannor · Gal Chechik · Eitan Zahavi · Gal Dalal -
2022 Poster: Reinforcement Learning with a Terminator »
Guy Tennenholtz · Nadav Merlis · Lior Shani · Shie Mannor · Uri Shalit · Gal Chechik · Assaf Hallak · Gal Dalal -
2021 Poster: Personalized Federated Learning With Gaussian Processes »
Idan Achituve · Aviv Shamsian · Aviv Navon · Gal Chechik · Ethan Fetaya -
2021 Poster: Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction »
Gal Dalal · Assaf Hallak · Steven Dalton · iuri frosio · Shie Mannor · Gal Chechik -
2020 Poster: A causal view of compositional zero-shot recognition »
Yuval Atzmon · Felix Kreuk · Uri Shalit · Gal Chechik -
2020 Spotlight: A causal view of compositional zero-shot recognition »
Yuval Atzmon · Felix Kreuk · Uri Shalit · Gal Chechik -
2020 Poster: Self-Supervised Learning by Cross-Modal Audio-Video Clustering »
Humam Alwassel · Dhruv Mahajan · Bruno Korbar · Lorenzo Torresani · Bernard Ghanem · Du Tran -
2020 Poster: COBE: Contextualized Object Embeddings from Narrated Instructional Video »
Gedas Bertasius · Lorenzo Torresani -
2020 Spotlight: Self-Supervised Learning by Cross-Modal Audio-Video Clustering »
Humam Alwassel · Dhruv Mahajan · Bruno Korbar · Lorenzo Torresani · Bernard Ghanem · Du Tran -
2019 Poster: STAR-Caps: Capsule Networks with Straight-Through Attentive Routing »
Karim Ahmed · Lorenzo Torresani -
2019 Poster: Learning Temporal Pose Estimation from Sparsely-Labeled Videos »
Gedas Bertasius · Christoph Feichtenhofer · Du Tran · Jianbo Shi · Lorenzo Torresani -
2018 Workshop: Workshop on Ethical, Social and Governance Issues in AI »
Chloe Bakalar · Sarah Bird · Tiberio Caetano · Edward W Felten · Dario Garcia · Isabel Kloumann · Finnian Lattimore · Sendhil Mullainathan · D. Sculley -
2018 Poster: Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction »
Roei Herzig · Moshiko Raboh · Gal Chechik · Jonathan Berant · Amir Globerson -
2018 Poster: Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization »
Bruno Korbar · Du Tran · Lorenzo Torresani -
2017 Poster: Learning to Inpaint for Image Compression »
Mohammad Haris Baig · Vladlen Koltun · Lorenzo Torresani -
2016 : ViCom: Benchmark and Methods for Video Comprehension »
Du Tran · Maksim Bolonkin · Manohar Paluri · Lorenzo Torresani -
2016 : CV @ Scale Challenges »
Manohar Paluri · Gal Chechik -
2016 : Introduction »
Lorenzo Torresani -
2014 Workshop: Analyzing the omics of the brain »
Michael Hawrylycz · Gal Chechik · Mark Reimers -
2012 Poster: Max-Margin Structured Output Regression for Spatio-Temporal Action Localization »
Du Tran · Junsong Yuan -
2010 Spotlight: Online Learning in The Manifold of Low-Rank Matrices »
Uri Shalit · Daphna Weinshall · Gal Chechik -
2010 Poster: Online Learning in The Manifold of Low-Rank Matrices »
Uri Shalit · Daphna Weinshall · Gal Chechik -
2009 Workshop: Machine Learning in Computational Biology »
Gal Chechik · Tomer Hertz · William S Noble · Yanjun Qi · Jean-Philippe Vert · Alexander Zien -
2009 Mini Symposium: Machine Learning in Computational Biology »
Yanjun Qi · Jean-Philippe Vert · Gal Chechik · Alexander Zien · Tomer Hertz · William S Noble -
2009 Poster: An Online Algorithm for Large Scale Image Similarity Learning »
Gal Chechik · Uri Shalit · Varun Sharma · Samy Bengio -
2008 Workshop: Machine Learning in Computational Biology »
Gal Chechik · Christina Leslie · Quaid Morris · William S Noble · Gunnar Rätsch -
2008 Mini Symposium: Machine Learning in Computational Biology »
Gal Chechik · Christina Leslie · Quaid Morris · William S Noble · Gunnar Rätsch -
2007 Workshop: Machine Learning in Computational Biology (Part 2) »
Gal Chechik · Christina Leslie · Quaid Morris · William S Noble · Gunnar Rätsch · Koji Tsuda -
2007 Workshop: Machine Learning in Computational Biology (Part 1) »
Gal Chechik · Christina Leslie · Quaid Morris · William S Noble · Gunnar Rätsch · Koji Tsuda -
2006 Workshop: New Problems and Methods in Computational Biology »
Gal Chechik · Quaid Morris · Koji Tsuda · Gunnar Rätsch · Christina Leslie · William S Noble -
2006 Poster: Max-margin classification of incomplete data »
Gal Chechik · Geremy Heitz · Gal Elidan · Pieter Abbeel · Daphne Koller -
2006 Poster: Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Task »
Alexis Battle · Gal Chechik · Daphne Koller -
2006 Spotlight: Max-margin classification of incomplete data »
Gal Chechik · Geremy Heitz · Gal Elidan · Pieter Abbeel · Daphne Koller -
2006 Talk: Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Task »
Alexis Battle · Gal Chechik · Daphne Koller