Timezone: »
Weakly supervised object detection (WSOD) has attracted extensive research attention due to its great flexibility of exploiting large-scale dataset with only image-level annotations for detector training. Despite its great advance in recent years, WSOD still suffers limited performance, which is far below that of fully supervised object detection (FSOD). As most WSOD methods depend on object proposal algorithms to generate candidate regions and are also confronted with challenges like low-quality predicted bounding boxes and large scale variation. In this paper, we propose a unified WSOD framework, termed UWSOD, to develop a high-capacity general detection model with only image-level labels, which is self-contained and does not require external modules or additional supervision. To this end, we exploit three important components, i.e., object proposal generation, bounding-box fine-tuning and scale-invariant features. First, we propose an anchor-based self-supervised proposal generator to hypothesize object locations, which is trained end-to-end with supervision created by UWSOD for both objectness classification and regression. Second, we develop a step-wise bounding-box fine-tuning to refine both detection scores and coordinates by progressively select high-confidence object proposals as positive samples, which bootstraps the quality of predicted bounding boxes. Third, we construct a multi-rate resampling pyramid to aggregate multi-scale contextual information, which is the first in-network feature hierarchy to handle scale variation in WSOD. Extensive experiments on PASCAL VOC and MS COCO show that the proposed UWSOD achieves competitive results with the state-of-the-art WSOD methods while not requiring external modules or additional supervision. Moreover, the upper-bound performance of UWSOD with class-agnostic ground-truth bounding boxes approaches Faster R-CNN, which demonstrates UWSOD has fully-supervised-level capacity.
Author Information
Yunhang Shen (Xiamen University)
Rongrong Ji (Xiamen University, China)
Zhiwei Chen (Xiamen University)
Yongjian Wu (Tencent Technology (Shanghai) Co.,Ltd)
Feiyue Huang (Tencent)
More from the Same Authors
-
2022 Poster: Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach »
Peng Mi · Li Shen · Tianhe Ren · Yiyi Zhou · Xiaoshuai Sun · Rongrong Ji · Dacheng Tao -
2022 Poster: PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining »
Yuting Gao · Jinfeng Liu · Zihan Xu · Jun Zhang · Ke Li · Rongrong Ji · Chunhua Shen -
2022 Poster: Learning Best Combination for Efficient N:M Sparsity »
Yuxin Zhang · Mingbao Lin · ZhiHang Lin · Yiting Luo · Ke Li · Fei Chao · Yongjian Wu · Rongrong Ji -
2021 Poster: Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model »
Jiangning Zhang · Chao Xu · Jian Li · Wenzhou Chen · Yabiao Wang · Ying Tai · Shuo Chen · Chengjie Wang · Feiyue Huang · Yong Liu -
2020 Poster: Rotated Binary Neural Network »
Mingbao Lin · Rongrong Ji · Zihan Xu · Baochang Zhang · Yan Wang · Yongjian Wu · Feiyue Huang · Chia-Wen Lin -
2019 Poster: Variational Structured Semantic Inference for Diverse Image Captioning »
Fuhai Chen · Rongrong Ji · Jiayi Ji · Xiaoshuai Sun · Baochang Zhang · Xuri Ge · Yongjian Wu · Feiyue Huang · Yan Wang -
2019 Poster: FreeAnchor: Learning to Match Anchors for Visual Object Detection »
Xiaosong Zhang · Fang Wan · Chang Liu · Rongrong Ji · Qixiang Ye -
2019 Poster: Information Competing Process for Learning Diversified Representations »
Jie Hu · Rongrong Ji · ShengChuan Zhang · Xiaoshuai Sun · Qixiang Ye · Chia-Wen Lin · Qi Tian