Timezone: »

 
Poster
Improved Multimodal Deep Learning with Variation of Information
Kihyuk Sohn · Wenling Shang · Honglak Lee

Mon Dec 08 04:00 PM -- 08:59 PM (PST) @ Level 2, room 210D #None

Deep learning has been successfully applied to multimodal representation learning problems, with a common strategy to learning joint representations that are shared across multiple modalities on top of layers of modality-specific networks. Nonetheless, there still remains a question how to learn a good association between data modalities; in particular, a good generative model of multimodal data should be able to reason about missing data modality given the rest of data modalities. In this paper, we propose a novel multimodal representation learning framework that explicitly aims this goal. Rather than learning with maximum likelihood, we train the model to minimize the variation of information. We provide a theoretical insight why the proposed learning objective is sufficient to estimate the data-generating joint distribution of multimodal data. We apply our method to restricted Boltzmann machines and introduce learning methods based on contrastive divergence and multi-prediction training. In addition, we extend to deep networks with recurrent encoding structure to finetune the whole network. In experiments, we demonstrate the state-of-the-art visual recognition performance on MIR-Flickr database and PASCAL VOC 2007 database with and without text features.

Author Information

Kihyuk Sohn (Google)
Wendy Shang (University of Michigan)
Honglak Lee (Google / U. Michigan)

More from the Same Authors