Timezone: »

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim · Yunji Kim · Jiyoung Lee · Kang Min Yoo · Sang-Woo Lee

Thu Dec 01 09:00 AM -- 11:00 AM (PST) @ Hall J #900

Text-to-image generation and image captioning are recently emerged as a new experimental paradigm to assess machine intelligence. They predict continuous quantity accompanied by their sampling techniques in the generation, making evaluation complicated and intractable to get marginal distributions. Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trained model, we propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID). To validate, we extensively compare it with competing metrics using carefully-generated or human-annotated judgments in text-to-image generation and image captioning tasks. The proposed MID significantly outperforms the competitive methods by having consistency across benchmarks, sample parsimony, and robustness toward the exploited CLIP model. We look forward to seeing the underrepresented implications of the Gaussian cross-mutual information in multimodal representation learning and future works based on this novel proposition.

Author Information

Jin-Hwa Kim (NAVER AI Lab)
Jin-Hwa Kim

Jin-Hwa Kim has been Technical Leader and Research Scientist at NAVER AI Lab since August 2021 and Guest Assistant Professor at Artificial Intelligence Institute of Seoul National University (SNU AIIS) since August 2022. He has been studying multimodal deep learning (e.g., [visual question answering](http://visualqa.org)), multimodal generation, ethical AI, and other related topics. In 2018, he received Ph.D. from Seoul National University under the supervision of Professor [Byoung-Tak Zhang](https://bi.snu.ac.kr/~btzhang/) for the work on "Multimodal Deep Learning for Visually-grounded Reasoning." In September 2017, he received [2017 Google Ph.D. Fellowship](https://ai.googleblog.com/2017/09/highlights-from-annual-google-phd.html) in Machine Learning, Ph.D. Completion Scholarship by Seoul National University, and the VQA Challenge 2018 runners-up at the [CVPR 2018 VQA Challenge and Visual Dialog Workshop](https://visualqa.org/workshop_2018.html). He was Research Intern at [Facebook AI Research](https://research.fb.com/category/facebook-ai-research/) (Menlo Park, CA) mentored by [Yuandong Tian](http://yuandong-tian.com), [Devi Parikh](https://www.cc.gatech.edu/~parikh/), and [Dhruv Batra](https://www.cc.gatech.edu/~dbatra/), from January to May in 2017. He had worked for SK Telecom (August 2018 to July 2021) and SK Communications (January 2011 to October 2012).

Yunji Kim (NAVER AI)
Jiyoung Lee (NAVER)
Kang Min Yoo (NAVER)
Sang-Woo Lee (Korea Advanced Institute of Science & Technology)

More from the Same Authors