Timezone: »
Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class. A recent study has shown a phenomenon called neural collapse that the within-class means of features and the classifier vectors converge to the vertices of a simplex equiangular tight frame (ETF) at the terminal phase of training on a balanced dataset. Since the ETF geometric structure maximally separates the pair-wise angles of all classes in the classifier, it is natural to raise the question, why do we spend an effort to learn a classifier when we know its optimal geometric structure? In this paper, we study the potential of learning a neural network for classification with the classifier randomly initialized as an ETF and fixed during training. Our analytical work based on the layer-peeled model indicates that the feature learning with a fixed ETF classifier naturally leads to the neural collapse state even when the dataset is imbalanced among classes. We further show that in this case the cross entropy (CE) loss is not necessary and can be replaced by a simple squared loss that shares the same global optimality but enjoys a better convergence property. Our experimental results show that our method is able to bring significant improvements with faster convergence on multiple imbalanced datasets.
Author Information
Yibo Yang (Looking for research position. Email me!)
https://iboing.github.io/
Shixiang Chen (Texas A&M)
Xiangtai Li (Peking University)
Liang Xie (Zhejiang University)
Zhouchen Lin (Peking University)
Dacheng Tao (University of Technology, Sydney)
More from the Same Authors
-
2021 Spotlight: Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State »
Mingqing Xiao · Qingyan Meng · Zongpeng Zhang · Yisen Wang · Zhouchen Lin -
2021 : AP-10K: A Benchmark for Animal Pose Estimation in the Wild »
Hang Yu · Yufei Xu · Jing Zhang · Wei Zhao · Ziyu Guan · Dacheng Tao -
2022 Poster: Rethinking Knowledge Graph Evaluation Under the Open-World Assumption »
Haotong Yang · Zhouchen Lin · Muhan Zhang -
2022 Poster: Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach »
Peng Mi · Li Shen · Tianhe Ren · Yiyi Zhou · Xiaoshuai Sun · Rongrong Ji · Dacheng Tao -
2022 Poster: ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation »
Yufei Xu · Jing Zhang · Qiming ZHANG · Dacheng Tao -
2022 Poster: APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking »
Yuxiang Yang · Junjie Yang · Yufei Xu · Jing Zhang · Long Lan · Dacheng Tao -
2022 : Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models »
Xingyu Xie · Pan Zhou · Huan Li · Zhouchen Lin · Shuicheng Yan -
2023 Poster: ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding »
Lunhao Duan · Shanshan Zhao · Nan Xue · Mingming Gong · Gui-Song Xia · Dacheng Tao -
2023 Poster: Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning »
Guozheng Ma · Linrui Zhang · Haoyu Wang · Lu Li · Zilin Wang · Zhen Wang · Li Shen · Xueqian Wang · Dacheng Tao -
2023 : Transformer-Based Large Language Models Are Not General Learners: A Universal Circuit Perspective »
Yang Chen · Yitao Liang · Zhouchen Lin -
2023 : Are Large Language Models Really Robust to Word-Level Perturbations? »
Haoyu Wang · Guozheng Ma · Cong Yu · Gui Ning · Linrui Zhang · Zhiqi Huang · Suwei Ma · Yongzhe Chang · Sen Zhang · Li Shen · Xueqian Wang · Peilin Zhao · Dacheng Tao -
2023 Poster: Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization »
Yan Sun · Li Shen · Dacheng Tao -
2023 Poster: Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective »
Yifei Wang · Liangchen Li · Jiansheng Yang · Zhouchen Lin · Yisen Wang -
2023 Poster: All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation »
Liyao Tang · Zhe Chen · Shanshan Zhao · Chaoyue Wang · Dacheng Tao -
2023 Poster: Cocktail: Mixing Multi-Modality Control for Text-Conditional Image Generation »
Minghui Hu · Jianbin Zheng · Daqing Liu · Chuanxia Zheng · Chaoyue Wang · Dacheng Tao · Tat-Jen Cham -
2023 Poster: SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model »
Di Wang · Jing Zhang · Bo Du · Minqiang Xu · Lin Liu · Dacheng Tao · Liangpei Zhang -
2023 Poster: Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm »
Miaoxi Zhu · Li Shen · Bo Du · Dacheng Tao -
2023 Poster: Explore In-Context Learning for 3D Point Cloud Understanding »
Zhongbin Fang · Xiangtai Li · Xia Li · Joachim M Buhmann · Chen Change Loy · Mengyuan Liu -
2023 Poster: Domain Re-Modulation for Few-Shot Generative Domain Adaptation »
Yi Wu · Ziqiang Li · Chaoyue Wang · Heliang Zheng · Shanshan Zhao · Bin Li · Dacheng Tao -
2023 Poster: A Single-Loop Accelerated Extra-Gradient Difference Algorithm with Improved Complexity Bounds for Constrained Minimax Optimization »
Yuanyuan Liu · Fanhua Shang · Weixin An · Junhao Liu · Hongying Liu · Zhouchen Lin -
2023 Poster: Extending the Design Space of Graph Neural Networks by Rethinking Folklore Weisfeiler-Lehman »
Jiarui Feng · Lecheng Kong · Hao Liu · Dacheng Tao · Fuhai Li · Muhan Zhang · Yixin Chen -
2023 Poster: GEQ: Gaussian Kernel Inspired Equilibrium Models »
Mingjie Li · Yisen Wang · Zhouchen Lin -
2023 Oral: A Single-Loop Accelerated Extra-Gradient Difference Algorithm with Improved Complexity Bounds for Constrained Minimax Optimization »
Yuanyuan Liu · Fanhua Shang · Weixin An · Junhao Liu · Hongying Liu · Zhouchen Lin -
2023 Poster: VanillaNet: the Power of Minimalism in Deep Learning »
Hanting Chen · Yunhe Wang · Jianyuan Guo · Dacheng Tao -
2023 Poster: Task-Robust Pre-Training for Worst-Case Downstream Adaptation »
Jianghui Wang · Yang Chen · Xingyu Xie · Cong Fang · Zhouchen Lin -
2023 Poster: MAG-GNN: Reinforcement Learning Boosted Graph Neural Network »
Lecheng Kong · Jiarui Feng · Hao Liu · Dacheng Tao · Yixin Chen · Muhan Zhang -
2023 Poster: 4D Panoptic Scene Graph Generation »
Jingkang Yang · Jun CEN · WENXUAN PENG · Shuai Liu · Fangzhou Hong · Xiangtai Li · Kaiyang Zhou · Qifeng Chen · Ziwei Liu -
2022 Spotlight: Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits »
Kaining Zhang · Liu Liu · Min-Hsiu Hsieh · Dacheng Tao -
2022 Spotlight: Lightning Talks 4B-4 »
Ziyue Jiang · Zeeshan Khan · Yuxiang Yang · Chenze Shao · Yichong Leng · Zehao Yu · Wenguan Wang · Xian Liu · Zehua Chen · Yang Feng · Qianyi Wu · James Liang · C.V. Jawahar · Junjie Yang · Zhe Su · Songyou Peng · Yufei Xu · Junliang Guo · Michael Niemeyer · Hang Zhou · Zhou Zhao · Makarand Tapaswi · Dongfang Liu · Qian Yang · Torsten Sattler · Yuanqi Du · Haohe Liu · Jing Zhang · Andreas Geiger · Yi Ren · Long Lan · Jiawei Chen · Wayne Wu · Dahua Lin · Dacheng Tao · Xu Tan · Jinglin Liu · Ziwei Liu · 振辉 叶 · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Spotlight: Lightning Talks 4A-3 »
Zhihan Gao · Yabin Wang · Xingyu Qu · Luziwei Leng · Mingqing Xiao · Bohan Wang · Yu Shen · Zhiwu Huang · Xingjian Shi · Qi Meng · Yupeng Lu · Diyang Li · Qingyan Meng · Kaiwei Che · Yang Li · Hao Wang · Huishuai Zhang · Zongpeng Zhang · Kaixuan Zhang · Xiaopeng Hong · Xiaohan Zhao · Di He · Jianguo Zhang · Yaofeng Tu · Bin Gu · Yi Zhu · Ruoyu Sun · Yuyang (Bernie) Wang · Zhouchen Lin · Qinghu Meng · Wei Chen · Wentao Zhang · Bin CUI · Jie Cheng · Zhi-Ming Ma · Mu Li · Qinghai Guo · Dit-Yan Yeung · Tie-Yan Liu · Jianxing Liao -
2022 Spotlight: Online Training Through Time for Spiking Neural Networks »
Mingqing Xiao · Qingyan Meng · Zongpeng Zhang · Di He · Zhouchen Lin -
2022 Spotlight: APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking »
Yuxiang Yang · Junjie Yang · Yufei Xu · Jing Zhang · Long Lan · Dacheng Tao -
2022 Spotlight: Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach »
Kaiwen Yang · Yanchao Sun · Jiahao Su · Fengxiang He · Xinmei Tian · Furong Huang · Tianyi Zhou · Dacheng Tao -
2022 Poster: CGLB: Benchmark Tasks for Continual Graph Learning »
Xikun Zhang · Dongjin Song · Dacheng Tao -
2022 Poster: Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits »
Kaining Zhang · Liu Liu · Min-Hsiu Hsieh · Dacheng Tao -
2022 Poster: Benefits of Permutation-Equivariance in Auction Mechanisms »
Tian Qin · Fengxiang He · Dingfeng Shi · Wenbing Huang · Dacheng Tao -
2022 Poster: Towards Theoretically Inspired Neural Initialization Optimization »
Yibo Yang · Hong Wang · Haobo Yuan · Zhouchen Lin -
2022 Poster: Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach »
Kaiwen Yang · Yanchao Sun · Jiahao Su · Fengxiang He · Xinmei Tian · Furong Huang · Tianyi Zhou · Dacheng Tao -
2022 Poster: Online Training Through Time for Spiking Neural Networks »
Mingqing Xiao · Qingyan Meng · Zongpeng Zhang · Di He · Zhouchen Lin -
2021 Poster: On Training Implicit Models »
Zhengyang Geng · Xin-Yu Zhang · Shaojie Bai · Yisen Wang · Zhouchen Lin -
2021 Poster: Dissecting the Diffusion Process in Linear Graph Convolutional Networks »
Yifei Wang · Yisen Wang · Jiansheng Yang · Zhouchen Lin -
2021 Poster: Class-Disentanglement and Applications in Adversarial Detection and Defense »
Kaiwen Yang · Tianyi Zhou · Yonggang Zhang · Xinmei Tian · Dacheng Tao -
2021 Poster: Gauge Equivariant Transformer »
Lingshen He · Yiming Dong · Yisen Wang · Dacheng Tao · Zhouchen Lin -
2021 Poster: ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias »
Yufei Xu · Qiming ZHANG · Jing Zhang · Dacheng Tao -
2021 Poster: Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State »
Mingqing Xiao · Qingyan Meng · Zongpeng Zhang · Yisen Wang · Zhouchen Lin -
2021 Poster: Efficient Equivariant Network »
Lingshen He · Yuxuan Chen · zhengyang shen · Yiming Dong · Yisen Wang · Zhouchen Lin -
2021 Poster: Residual Relaxation for Multi-view Representation Learning »
Yifei Wang · Zhengyang Geng · Feng Jiang · Chuming Li · Yisen Wang · Jiansheng Yang · Zhouchen Lin -
2020 Poster: ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding »
Yibo Yang · Hongyang Li · Shan You · Fei Wang · Chen Qian · Zhouchen Lin -
2018 Workshop: NIPS 2018 workshop on Compact Deep Neural Networks with industrial applications »
Lixin Fan · Zhouchen Lin · Max Welling · Yurong Chen · Werner Bailer -
2018 Poster: SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator »
Cong Fang · Chris Junchi Li · Zhouchen Lin · Tong Zhang -
2018 Spotlight: SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator »
Cong Fang · Chris Junchi Li · Zhouchen Lin · Tong Zhang -
2018 Poster: Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution »
Zhisheng Zhong · Tiancheng Shen · Yibo Yang · Zhouchen Lin · Chao Zhang -
2018 Poster: Learning Versatile Filters for Efficient Convolutional Neural Networks »
Yunhe Wang · Chang Xu · Chunjing XU · Chao Xu · Dacheng Tao -
2017 Poster: Faster and Non-ergodic O(1/K) Stochastic Alternating Direction Method of Multipliers »
Cong Fang · Feng Cheng · Zhouchen Lin -
2015 Poster: Accelerated Proximal Gradient Methods for Nonconvex Programming »
Huan Li · Zhouchen Lin