Timezone: »
Since its release in 2010, ImageNet has played an instrumental role in the development of deep learning architectures for computer vision, enabling neural networks to greatly outperform hand-crafted visual representations. ImageNet also quickly became the go-to benchmark for model architectures and training techniques which eventually reach far beyond image classification. Today’s models are getting close to “solving” the benchmark. Models trained on ImageNet have been used as strong initialization for numerous downstream tasks. The ImageNet dataset has even been used for tasks going way beyond its initial purpose of training classification model. It has been leveraged and reinvented for tasks such as few-shot learning, self-supervised learning and semi-supervised learning. Interesting re-creation of the ImageNet benchmark enables the evaluation of novel challenges like robustness, bias, or concept generalization. More accurate labels have been provided. About 10 years later, ImageNet symbolizes a decade of staggering advances in computer vision, deep learning, and artificial intelligence.
We believe now is a good time to discuss what’s next: Did we solve ImageNet? What are the main lessons learnt thanks to this benchmark? What should the next generation of ImageNet-like benchmarks encompass? Is language supervision a promising alternative? How can we reflect on the diverse requirements for good datasets and models, such as fairness, privacy, security, generalization, scale, and efficiency?
Mon 4:00 a.m. - 4:30 a.m.
|
Opening
(
Opening presentation
)
SlidesLive Video » Opening ceremony |
🔗 |
Mon 4:30 a.m. - 5:00 a.m.
|
Fairness and privacy aspects of ImageNet
(
Talk
)
SlidesLive Video » |
Olga Russakovsky · Kaiyu Yang 🔗 |
Mon 5:00 a.m. - 5:30 a.m.
|
OpenImages: One Dataset for Many Computer Vision Tasks
(
Talk
)
SlidesLive Video » |
Vittorio Ferrari 🔗 |
Mon 5:30 a.m. - 6:00 a.m.
|
Object recognition in machines and brains
(
Talk
)
SlidesLive Video » |
Matthias Bethge 🔗 |
Mon 6:00 a.m. - 7:00 a.m.
|
Live panel: The future of ImageNet
(
Live panel
)
SlidesLive Video » |
Matthias Bethge · Vittorio Ferrari · Olga Russakovsky 🔗 |
Mon 7:30 a.m. - 7:45 a.m.
|
Spotlight talk: ResNet strikes back: An improved training procedure in timm.
(
Oral session
)
link »
SlidesLive Video » The influential Residual Networks designed by He et al. remains the gold-standard architecture in numerous scientific publications. They typically serve as the default architecture in studies, or as baselines when new architectures are proposed. Yet there has been significant progress on best practices for training neural networks since the inception of the ResNet architecture in 2015. Novel optimization & data-augmentation have increased the effectiveness of the training recipes. In this paper, we re-evaluate the performance of the vanilla ResNet-50 when trained with a procedure that integrates such advances. We share competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work. For instance, with our more demanding training setting, a vanilla ResNet-50 reaches 80.4\% top-1 accuracy at resolution 224x224 on ImageNet-val without extra data or distillation. We also report the performance achieved with popular models with our training procedure. |
Hugo Touvron 🔗 |
Mon 7:45 a.m. - 8:45 a.m.
|
Poster session A ( Poster session ) link » | 🔗 |
Mon 8:45 a.m. - 9:15 a.m.
|
Is ImageNet Solved? Evaluating Machine Accuracy
(
Talk
)
SlidesLive Video » |
Becca Roelofs 🔗 |
Mon 9:15 a.m. - 9:45 a.m.
|
From ImageNet to Image Classification
(
Talk
)
SlidesLive Video » |
Shibani Santurkar 🔗 |
Mon 9:45 a.m. - 10:15 a.m.
|
Are we done with ImageNet?
(
Talk
)
SlidesLive Video » We ask whether recent progress on the ImageNet classification benchmark continues to represent meaningful generalization, or whether the community has started to overfit to the idiosyncrasies of its labeling procedure. We therefore develop a significantly more robust procedure for collecting human annotations of the ImageNet validation set. Using these new labels, we reassess the accuracy of recently proposed ImageNet classifiers, and find their gains to be substantially smaller than those reported on the original labels. Furthermore, we find the original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end. Nevertheless, we find our annotation procedure to have largely remedied the errors in the original labels, reinforcing ImageNet as a powerful benchmark for future research in visual recognition. |
Alexander Kolesnikov 🔗 |
Mon 10:15 a.m. - 11:15 a.m.
|
Live panel: Did we solve ImageNet?
(
Live panel
)
SlidesLive Video » |
Shibani Santurkar · Alexander Kolesnikov · Becca Roelofs 🔗 |
Mon 11:45 a.m. - 12:15 p.m.
|
Uncovering the Deep Unknowns of ImageNet Model: Challenges and Opportunties
(
Talk
)
SlidesLive Video » |
Yixuan Li 🔗 |
Mon 12:15 p.m. - 12:45 p.m.
|
ImageNet models from the trenches
(
Talk
)
SlidesLive Video » |
Ross Wightman 🔗 |
Mon 12:45 p.m. - 1:15 p.m.
|
Using ImageNet to Measure Robustness and Uncertainty
(
Talk
)
SlidesLive Video » |
Dawn Song · Dan Hendrycks 🔗 |
Mon 1:15 p.m. - 2:15 p.m.
|
Live panel: Perspectives on ImageNet.
(
Live panel
)
SlidesLive Video » |
Dawn Song · Ross Wightman · Dan Hendrycks 🔗 |
Mon 2:30 p.m. - 3:00 p.m.
|
ImageNets of "x": ImageNet's Infrastructural Impact
(
Talk
)
SlidesLive Video » |
Emily Denton · Alex Hanna 🔗 |
Mon 3:00 p.m. - 3:30 p.m.
|
Live panel: ImageNets of "x": ImageNet's Infrastructural Impact
(
Live panel
)
SlidesLive Video » |
Emily Denton · Alex Hanna 🔗 |
Mon 3:45 p.m. - 4:00 p.m.
|
Spotlight talk: Learning Background Invariance Improves Generalization and Robustness in Self Supervised Learning on ImageNet and Beyond
(
Oral session
)
link »
SlidesLive Video » Unsupervised representation learning is an important challenge in computer vision. Recent progress in self-supervised learning has demonstrated promising results in multiple visual tasks. An important ingredient in high-performing self-supervised methods is the use of data augmentation by training models to place different augmented views of the same image nearby in embedding space. However, commonly used augmentation pipelines treat images holistically, ignoring the semantic relevance of parts of an image—e.g. a subject vs. a background—which can lead to the learning of spurious correlations. Our work addresses this problem by investigating a class of simple, yet highly effective “background augmentations", which encourage models to focus on semantically-relevant content by discouraging them from focusing on image backgrounds. Through a systematic, comprehensive investigation, we show that background augmentations lead to improved generalization with substantial improvements (~1-2% on ImageNet) in performance across a spectrum of state-of-the-art self-supervised methods (MoCo-v2, BYOL, SwAV) on a variety of tasks, even allowing us to reach within 0.1% of supervised performance on ImageNet. We also find improved label efficiency with even larger performance improvements in limited label settings (up to 4.2%). Further, we find improved training efficiency, attaining a benchmark accuracy of 74.4%, outperforming many recent self-supervised learning methods trained for 800-1000 epochs, in only 100 epochs. Importantly, we also demonstrate that background augmentations boost generalization and robustness to a number of out-of-distribution settings, including the Backgrounds Challenge, natural adversarial examples, adversarial attacks, ImageNet-Renditions and ImageNet ReaL. We also make progress in completely unsupervised saliency detection, in the process of generating saliency masks that we use for background augmentations. |
Chaitanya Ryali 🔗 |
Mon 4:00 p.m. - 5:00 p.m.
|
Poster session B ( Poster session ) link » | 🔗 |
Mon 5:00 p.m. - 5:15 p.m.
|
Closing & awards
(
Workshop closing
)
SlidesLive Video » |
🔗 |
Author Information
Zeynep Akata (University of Tübingen)
Lucas Beyer (Google Brain Zürich)
Sanghyuk Chun (NAVER AI Lab)
I'm a research scientist and tech leader at NAVER AI Lab, working on machine learning and its applications. In particular, my research interests focus on bridging the gap between two gigantic topics: reliable machine learning tasks (e.g., robustness [C3, C9, C10, W1, W3], de-biasing or domain generalization [C6, A6], uncertainty estimation [C11, A3], explainability [C5, C11, A2, A4, W2], and fair evaluation [C5, C11]) and learning with limited annotations (e.g., multi-modal learning [C11], weakly-supervised learning [C2, C3, C4, C5, C7, C8, C12, W2, W4, W5, W6, A2, A4], and self-supervised learning). I have contributed large-scale machine learning algorithms [C3, C9, C10, C13] in NAVER AI Lab as well. Prior to working at NAVER, I worked as a research engineer at the advanced recommendation team (ART) in Kakao from 2016 to 2018. I received a master's degree in Electrical Engineering from Korea Advanced Institute of Science and Technology (KAIST) in 2016. During my master's degree, I researched a scalable algorithm for robust subspace clustering (the algorithm is based on robust PCA and k-means clustering). Before my master's study, I worked at IUM-SOCIUS in 2012 as a software engineering internship. I also did a research internship at Networked and Distributed Computing System Lab in KAIST and NAVER Labs during summer 2013 and fall 2015, respectively.
A. Sophia Koepke (University of Tübingen)
Diane Larlus (NAVER LABS Europe)
Seong Joon Oh (NAVER AI Lab)
Rafael Rezende (NAVER LABS EUROPE)
Sangdoo Yun (Clova AI Research, NAVER Corp.)
Xiaohua Zhai (Google Brain)
More from the Same Authors
-
2021 : A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches »
Vincent Dumoulin · Neil Houlsby · Utku Evci · Xiaohua Zhai · Ross Goroshin · Sylvain Gelly · Hugo Larochelle -
2021 : Concept Generalization in Visual Representation Learning »
· Yannis Kalantidis · Diane Larlus · Karteek Alahari -
2022 : PlanT: Explainable Planning Transformers via Object-Level Representations »
Katrin Renz · Kashyap Chitta · Otniel-Bogdan Mercea · A. Sophia Koepke · Zeynep Akata · Andreas Geiger -
2022 : Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning »
Zafir Stojanovski · Karsten Roth · Zeynep Akata -
2022 : Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning »
Zafir Stojanovski · Karsten Roth · Zeynep Akata -
2023 Poster: Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships »
ABHRA CHAUDHURI · Massimiliano Mancini · Zeynep Akata · Anjan Dutta -
2023 Poster: Meta-in-context learning in large language models »
Julian Coda-Forno · Marcel Binz · Zeynep Akata · Matt Botvinick · Jane Wang · Eric Schulz -
2023 Poster: Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design »
Ibrahim Alabdulmohsin · Lucas Beyer · Alexander Kolesnikov · Xiaohua Zhai -
2023 Poster: A Bayesian Perspective On Training Data Attribution »
Elisa Nguyen · Minjoon Seo · Seong Joon Oh -
2023 Poster: ProPILE: Probing Privacy Leakage in Large Language Models »
Siwon Kim · Sangdoo Yun · Hwaran Lee · Martin Gubri · Sungroh Yoon · Seong Joon Oh -
2023 Poster: Three Towers: Flexible Contrastive Learning with Pretrained Image Models »
Jannik Kossen · Mark Collier · Basil Mustafa · Xiao Wang · Xiaohua Zhai · Lucas Beyer · Andreas Steiner · Jesse Berent · Rodolphe Jenatton · Effrosyni Kokiopoulou -
2023 Poster: ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets »
Damien Teney · Yong Lin · Seong Joon Oh · Ehsan Abbasnejad -
2023 Poster: In-Context Impersonation Reveals Large Language Models' Strengths and Biases »
Leonard Salewski · Isabel Rio-Torto · Stephan Alaniz · Eric Schulz · Zeynep Akata -
2023 Poster: Image Captioners Are Scalable Vision Learners Too »
Michael Tschannen · Manoj Kumar · Andreas Steiner · Xiaohua Zhai · Neil Houlsby · Lucas Beyer -
2023 Poster: URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates »
Michael Kirchhof · Bálint Mucsányi · Seong Joon Oh · Enkelejda Kasneci -
2023 Poster: EPIC Fields: Marrying 3D Geometry and Video Understanding »
Vadim Tschernezki · Ahmad Darkhalil · Zhifan Zhu · David Fouhey · Iro Laina · Diane Larlus · Dima Damen · Andrea Vedaldi -
2023 Oral: Image Captioners Are Scalable Vision Learners Too »
Michael Tschannen · Manoj Kumar · Andreas Steiner · Xiaohua Zhai · Neil Houlsby · Lucas Beyer -
2022 Spotlight: Lightning Talks 6A-4 »
Xiu-Shen Wei · Konstantina Dritsa · Guillaume Huguet · ABHRA CHAUDHURI · Zhenbin Wang · Kevin Qinghong Lin · Yutong Chen · Jianan Zhou · Yongsen Mao · Junwei Liang · Jinpeng Wang · Mao Ye · Yiming Zhang · Aikaterini Thoma · H.-Y. Xu · Daniel Sumner Magruder · Enwei Zhang · Jianing Zhu · Ronglai Zuo · Massimiliano Mancini · Hanxiao Jiang · Jun Zhang · Fangyun Wei · Faen Zhang · Ioannis Pavlopoulos · Zeynep Akata · Xiatian Zhu · Jingfeng ZHANG · Alexander Tong · Mattia Soldan · Chunhua Shen · Yuxin Peng · Liuhan Peng · Michael Wray · Tongliang Liu · Anjan Dutta · Yu Wu · Oluwadamilola Fasina · Panos Louridas · Angel Chang · Manik Kuchroo · Manolis Savva · Shujie LIU · Wei Zhou · Rui Yan · Gang Niu · Liang Tian · Bo Han · Eric Z. XU · Guy Wolf · Yingying Zhu · Brian Mak · Difei Gao · Masashi Sugiyama · Smita Krishnaswamy · Rong-Cheng Tu · Wenzhe Zhao · Weijie Kong · Chengfei Cai · WANG HongFa · Dima Damen · Bernard Ghanem · Wei Liu · Mike Zheng Shou -
2022 Spotlight: Relational Proxies: Emergent Relationships as Fine-Grained Discriminators »
ABHRA CHAUDHURI · Massimiliano Mancini · Zeynep Akata · Anjan Dutta -
2022 Poster: UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes »
Alexander Kolesnikov · André Susano Pinto · Lucas Beyer · Xiaohua Zhai · Jeremiah Harmsen · Neil Houlsby -
2022 Poster: Relational Proxies: Emergent Relationships as Fine-Grained Discriminators »
ABHRA CHAUDHURI · Massimiliano Mancini · Zeynep Akata · Anjan Dutta -
2022 Poster: SelecMix: Debiased Learning by Contradicting-pair Sampling »
Inwoo Hwang · Sangjun Lee · Yunhyeok Kwak · Seong Joon Oh · Damien Teney · Jin-Hwa Kim · Byoung-Tak Zhang -
2022 Poster: Revisiting Neural Scaling Laws in Language and Vision »
Ibrahim Alabdulmohsin · Behnam Neyshabur · Xiaohua Zhai -
2022 Poster: A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective »
Chanwoo Park · Sangdoo Yun · Sanghyuk Chun -
2021 Poster: SWAD: Domain Generalization by Seeking Flat Minima »
Junbum Cha · Sanghyuk Chun · Kyungjae Lee · Han-Cheol Cho · Seunghyun Park · Yunsung Lee · Sungrae Park -
2021 Poster: MLP-Mixer: An all-MLP Architecture for Vision »
Ilya Tolstikhin · Neil Houlsby · Alexander Kolesnikov · Lucas Beyer · Xiaohua Zhai · Thomas Unterthiner · Jessica Yung · Andreas Steiner · Daniel Keysers · Jakob Uszkoreit · Mario Lucic · Alexey Dosovitskiy -
2021 Poster: Neural Hybrid Automata: Learning Dynamics With Multiple Modes and Stochastic Transitions »
Michael Poli · Stefano Massaroli · Luca Scimeca · Sanghyuk Chun · Seong Joon Oh · Atsushi Yamashita · Hajime Asama · Jinkyoo Park · Animesh Garg -
2021 Poster: Revisiting the Calibration of Modern Neural Networks »
Matthias Minderer · Josip Djolonga · Rob Romijnders · Frances Hubis · Xiaohua Zhai · Neil Houlsby · Dustin Tran · Mario Lucic -
2020 Poster: Attribute Prototype Network for Zero-Shot Learning »
Wenjia Xu · Yongqin Xian · Jiuniu Wang · Bernt Schiele · Zeynep Akata -
2020 Poster: Hard Negative Mixing for Contrastive Learning »
Yannis Kalantidis · Mert Bulent Sariyildiz · Noe Pion · Philippe Weinzaepfel · Diane Larlus