Timezone: »
Text-to-image generation and image captioning are recently emerged as a new experimental paradigm to assess machine intelligence. They predict continuous quantity accompanied by their sampling techniques in the generation, making evaluation complicated and intractable to get marginal distributions. Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trained model, we propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID). To validate, we extensively compare it with competing metrics using carefully-generated or human-annotated judgments in text-to-image generation and image captioning tasks. The proposed MID significantly outperforms the competitive methods by having consistency across benchmarks, sample parsimony, and robustness toward the exploited CLIP model. We look forward to seeing the underrepresented implications of the Gaussian cross-mutual information in multimodal representation learning and future works based on this novel proposition.
Author Information
Jin-Hwa Kim (NAVER AI Lab)
Jin-Hwa Kim has been Technical Leader and Research Scientist at NAVER AI Lab since August 2021 and Guest Assistant Professor at Artificial Intelligence Institute of Seoul National University (SNU AIIS) since August 2022. He has been studying multimodal deep learning (e.g., [visual question answering](http://visualqa.org)), multimodal generation, ethical AI, and other related topics. In 2018, he received Ph.D. from Seoul National University under the supervision of Professor [Byoung-Tak Zhang](https://bi.snu.ac.kr/~btzhang/) for the work on "Multimodal Deep Learning for Visually-grounded Reasoning." In September 2017, he received [2017 Google Ph.D. Fellowship](https://ai.googleblog.com/2017/09/highlights-from-annual-google-phd.html) in Machine Learning, Ph.D. Completion Scholarship by Seoul National University, and the VQA Challenge 2018 runners-up at the [CVPR 2018 VQA Challenge and Visual Dialog Workshop](https://visualqa.org/workshop_2018.html). He was Research Intern at [Facebook AI Research](https://research.fb.com/category/facebook-ai-research/) (Menlo Park, CA) mentored by [Yuandong Tian](http://yuandong-tian.com), [Devi Parikh](https://www.cc.gatech.edu/~parikh/), and [Dhruv Batra](https://www.cc.gatech.edu/~dbatra/), from January to May in 2017. He had worked for SK Telecom (August 2018 to July 2021) and SK Communications (January 2011 to October 2012).
Yunji Kim (NAVER AI)
Jiyoung Lee (NAVER)
Kang Min Yoo (NAVER)
Sang-Woo Lee (Korea Advanced Institute of Science & Technology)
More from the Same Authors
-
2023 Poster: Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization »
Jeonghoon Kim · Jung Hyun Lee · Sungdong Kim · Joonsuk Park · Kang Min Yoo · Se Jung Kwon · Dongsoo Lee -
2022 Poster: SelecMix: Debiased Learning by Contradicting-pair Sampling »
Inwoo Hwang · Sangjun Lee · Yunhyeok Kwak · Seong Joon Oh · Damien Teney · Jin-Hwa Kim · Byoung-Tak Zhang -
2022 Poster: Understanding Cross-Domain Few-Shot Learning Based on Domain Similarity and Few-Shot Difficulty »
Jaehoon Oh · Sungnyun Kim · Namgyu Ho · Jin-Hwa Kim · Hwanjun Song · Se-Young Yun -
2019 Poster: Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction »
Yunji Kim · Seonghyeon Nam · In Cho · Seon Joo Kim -
2018 Poster: Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language »
Seonghyeon Nam · Yunji Kim · Seon Joo Kim -
2018 Spotlight: Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language »
Seonghyeon Nam · Yunji Kim · Seon Joo Kim -
2018 Poster: Bilinear Attention Networks »
Jin-Hwa Kim · Jaehyun Jun · Byoung-Tak Zhang -
2017 Poster: Overcoming Catastrophic Forgetting by Incremental Moment Matching »
Sang-Woo Lee · Jin-Hwa Kim · Jaehyun Jun · Jung-Woo Ha · Byoung-Tak Zhang -
2017 Spotlight: Overcoming Catastrophic Forgetting by Incremental Moment Matching »
Sang-Woo Lee · Jin-Hwa Kim · Jaehyun Jun · Jung-Woo Ha · Byoung-Tak Zhang -
2016 Poster: Multimodal Residual Learning for Visual QA »
Jin-Hwa Kim · Sang-Woo Lee · Donghyun Kwak · Min-Oh Heo · Jeonghee Kim · Jung-Woo Ha · Byoung-Tak Zhang