Skip to yearly menu bar Skip to main content


Demonstration

Toronto Deep Learning

Jamie Kiros · Russ Salakhutdinov · Nitish Srivastava · Yichuan Charlie Tang

Level 2, room 230B

Abstract:

We demonstrate an interactive system for tagging, retrieving and generating sentence descriptions for images. Our models are based on learning a multimodal vector space using deep convolutional networks and long short-term memory (LSTM) recurrent networks for encoding images and sentences. A highly structured multimodal neural language model is used for decoding and generating image descriptions from scratch.

Alongside this, we will also showcase a mobile app where a user can take pictures with their phone (such as objects in the demonstration room) and have these images be classified in real time.

Live content is unavailable. Log in and register to view live content