Timezone: »

Learning to Generate Visual Questions with Noisy Supervision
Shen Kai · Lingfei Wu · Siliang Tang · Yueting Zhuang · zhen he · Zhuoye Ding · Yun Xiao · Bo Long

Thu Dec 09 08:30 AM -- 10:00 AM (PST) @

The task of visual question generation (VQG) aims to generate human-like neural questions from an image and potentially other side information (e.g., answer type or the answer itself). Existing works often suffer from the severe one image to many questions mapping problem, which generates uninformative and non-referential questions. Recent work has demonstrated that by leveraging double visual and answer hints, a model can faithfully generate much better quality questions. However, visual hints are not available naturally. Despite they proposed a simple rule-based similarity matching method to obtain candidate visual hints, they could be very noisy practically and thus restrict the quality of generated questions. In this paper, we present a novel learning approach for double-hints based VQG, which can be cast as a weakly supervised learning problem with noises. The key rationale is that the salient visual regions of interest can be viewed as a constraint to improve the generation procedure for producing high-quality questions. As a result, given the predicted salient visual regions of interest, we can focus on estimating the probability of being ground-truth questions, which in turn implicitly measures the quality of predicted visual hints. Experimental results on two benchmark datasets show that our proposed method outperforms the state-of-the-art approaches by a large margin on a variety of metrics, including both automatic machine metrics and human evaluation.

Author Information

Shen Kai (Zhejiang University)
Lingfei Wu (JD.COM Silicon Valley Research Center)

Dr. Lingfei Wu earned his Ph.D. degree in computer science from the College of William and Mary in 2016. He is a research staff member at IBM Research and is leading a research team (10+ RSMs) for developing novel Graph Neural Networks for various tasks, which leads to the #1 AI Challenge Project in IBM Research and multiple IBM Awards including Outstanding Technical Achievement Award. He has published more than 70 top-ranked conference and journal papers and is a co-inventor of more than 30 filed US patents. Because of the high commercial value of his patents, he has received several invention achievement awards and has been appointed as IBM Master Inventors, class of 2020. He was the recipients of the Best Paper Award and Best Student Paper Award of several conferences such as IEEE ICC’19, AAAI workshop on DLGMA’20 and KDD workshop on DLG'19. His research has been featured in numerous media outlets, including NatureNews, YahooNews, Venturebeat, and TechTalks. He has co-organized 10+ conferences (AAAI, IEEE BigData) and is the founding co-chair for Workshops of Deep Learning on Graphs (with AAAI’21, AAAI’20, KDD’20, KDD’19, and IEEE BigData’19). He has currently served as Associate Editor for  IEEE Transactions on Neural Networks and Learning Systems, ACM Transactions on Knowledge Discovery from Data and International Journal of Intelligent Systems, and regularly served as a SPC/PC member of the following major AI/ML/NLP conferences including KDD, IJCAI, AAAI, NIPS, ICML, ICLR, and ACL.

Siliang Tang (Zhejiang University)

Dr. Siliang Tang is currently an associate professor at the College of Computer Science, Zhejiang University. In 2012, Siliang got his Ph.D. degree at the National University of Ireland, Maynooth, Ireland. His research interests include Information Extraction, Knowledge-base construction, and Multimodal Data Analysis. So far, he has published more than 70 papers in top-tier scientific conferences/journals such as AAAI, IJCAI (Artificial Intelligence); ACL, EMNLP, NAACL, SIGIR, IEEE TKDE (NLP and Information Extraction); ACM MM, CVPR, IEEE Trans. on Multimedia (Multimodal Understanding and Reasoning); IEEE Trans. on Image Processing, IEEE Trans. on Circuits and Systems for Video Technology (Image Processing and Understanding); IEEE VIS, IEEE Trans. on Visualization and Computer Graphics (Data Visualization). He has been serving as area chair or program committee member in conferences such as NIPS, ICML, AAAI, IJCAI, ACL, EMNLP, NAACL, and reviewers of journals such as IEEE TIP, IEEE TMM, IEEE TSMC, ACM Computing Surveys, Nature- Scientific Reports, etc. At present, he is mainly working for CKCEST on the projects of automatic knowledge base construction from semi-structured and unstructured text. He and his team participated in the NIST TAC (https://tac.nist.gov/) competition for knowledge base population since 2015 and won the top three places in some tracks (English EDL 2016 1st place; TEDL 2017 2nd place; DDI 2018, task1&2, 1st place) several times.

Yueting Zhuang (Zhejiang University)
zhen he (Nankai University)
Zhuoye Ding (JD.com)
Yun Xiao (JD.COM Silicon Valley Research Center)
Bo Long (State University of New York, Binghamton)

More from the Same Authors