Timezone: »
Despite the exciting progress in image captioning, generating diverse captions for a given image remains as an open problem. Existing methods typically apply generative models such as Variational Auto-Encoder to diversify the captions, which however neglect two key factors of diverse expression, i.e., the lexical diversity and the syntactic diversity. To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema. VSSI-cap mainly innovates in a novel structure, i.e., Variational Multi-modal Inferring tree (termed VarMI-tree). In particular, conditioned on the visual-textual features from the encoder, the VarMI-tree models the lexical and syntactic diversities by inferring their latent variables (with variations) in an approximate posterior inference guided by a visual semantic prior. Then, a reconstruction loss and the posterior-prior KL-divergence are jointly estimated to optimize the VSSI-cap model. Finally, diverse captions are generated upon the visual features and the latent variables from this structured encoder-inferer-decoder model. Experiments on the benchmark dataset show that the proposed VSSI-cap achieves significant improvements over the state-of-the-arts.
Author Information
Fuhai Chen (Xiamen University)
Fuhai Chen is currently a final-year Ph.D. student in Artificial Intelligence Department of Xiamen University, advised by Prof. Rongrong Ji. He received the B.S. Degree in Cognitive Science and Technology from Xiamen University in 2014. He obtained the M.S.-Ph.D qualification and finished his M.S. in Xiamen University in 2016. His research interests are in Computer Vision, Multimedia and Machine Learning. He is now finding the postdoc position.
Rongrong Ji (Xiamen University, China)
Jiayi Ji (Xiamen University)
Xiaoshuai Sun (Xiamen University)
Baochang Zhang (Beihang University)
Xuri Ge (Xiamen University)
Yongjian Wu (Tencent Technology (Shanghai) Co.,Ltd)
Feiyue Huang (Tencent)
Yan Wang (Microsoft)
More from the Same Authors
-
2022 Poster: Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach »
Peng Mi · Li Shen · Tianhe Ren · Yiyi Zhou · Xiaoshuai Sun · Rongrong Ji · Dacheng Tao -
2022 Poster: FNeVR: Neural Volume Rendering for Face Animation »
Bohan Zeng · Boyu Liu · Hong Li · Xuhui Liu · Jianzhuang Liu · Dapeng Chen · Wei Peng · Baochang Zhang -
2022 Poster: PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining »
Yuting Gao · Jinfeng Liu · Zihan Xu · Jun Zhang · Ke Li · Rongrong Ji · Chunhua Shen -
2023 Poster: Discover and Align Taxonomic Context Priors for Open-world Semi-Supervised Learning »
Yu Wang · Zhun Zhong · Pengchong Qiao · Xuxin Cheng · Xiawu Zheng · Chang Liu · Nicu Sebe · Rongrong Ji · Jie Chen -
2023 Poster: Improving Adversarial Robustness via Information Bottleneck Distillation »
Huafeng Kuang · Hong Liu · Shin'ichi Satoh · Yongjian Wu · Rongrong Ji -
2023 Poster: Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models »
Gen Luo · Yiyi Zhou · Tianhe Ren · Shengxin Chen · Xiaoshuai Sun · Rongrong Ji -
2023 Poster: Q-DM: An Efficient Low-bit Quantized Diffusion Model »
Yanjing Li · Sheng Xu · Xianbin Cao · Baochang Zhang · Xiao Sun -
2023 Poster: Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models »
Qiong Wu · Wei Yu · Yiyi Zhou · Shubin Huang · Xiaoshuai Sun · Rongrong Ji -
2023 Poster: CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes »
Yulei Qin · Xingyu Chen · Yunhang Shen · Chaoyou Fu · Yun Gu · Ke Li · Xing Sun · Rongrong Ji -
2022 Poster: Learning Best Combination for Efficient N:M Sparsity »
Yuxin Zhang · Mingbao Lin · ZhiHang Lin · Yiting Luo · Ke Li · Fei Chao · Yongjian Wu · Rongrong Ji -
2022 Poster: Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer »
Yanjing Li · Sheng Xu · Baochang Zhang · Xianbin Cao · Peng Gao · Guodong Guo -
2021 Poster: Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model »
Jiangning Zhang · Chao Xu · Jian Li · Wenzhou Chen · Yabiao Wang · Ying Tai · Shuo Chen · Chengjie Wang · Feiyue Huang · Yong Liu -
2021 Poster: Dual-stream Network for Visual Recognition »
Mingyuan Mao · peng gao · Renrui Zhang · Honghui Zheng · Teli Ma · Yan Peng · Errui Ding · Baochang Zhang · Shumin Han -
2020 Poster: Rotated Binary Neural Network »
Mingbao Lin · Rongrong Ji · Zihan Xu · Baochang Zhang · Yan Wang · Yongjian Wu · Feiyue Huang · Chia-Wen Lin -
2020 Poster: UWSOD: Toward Fully-Supervised-Level Capacity Weakly Supervised Object Detection »
Yunhang Shen · Rongrong Ji · Zhiwei Chen · Yongjian Wu · Feiyue Huang -
2019 Poster: FreeAnchor: Learning to Match Anchors for Visual Object Detection »
Xiaosong Zhang · Fang Wan · Chang Liu · Rongrong Ji · Qixiang Ye -
2019 Poster: Information Competing Process for Learning Diversified Representations »
Jie Hu · Rongrong Ji · ShengChuan Zhang · Xiaoshuai Sun · Qixiang Ye · Chia-Wen Lin · Qi Tian