Timezone: »

DeViSE: A Deep Visual-Semantic Embedding Model
Andrea Frome · Greg Corrado · Jonathon Shlens · Samy Bengio · Jeff Dean · Marc'Aurelio Ranzato · Tomas Mikolov

Sun Dec 08 02:00 PM -- 06:00 PM (PST) @ Harrah's Special Events Center, 2nd Floor #None

Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One remedy is to leverage data from other sources -- such as text data -- both to train visual models and to constrain their predictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. We demonstrate that this model matches state-of-the-art performance on the 1000-class ImageNet object recognition challenge while making more semantically reasonable errors, and also show that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Semantic knowledge improves such zero-shot predictions by up to 65%, achieving hit rates of up to 10% across thousands of novel labels never seen by the visual model.

Author Information

Andrea Frome (Google Research)
Greg Corrado (Google Health)
Jonathon Shlens (Google)
Samy Bengio (Apple)
Jeff Dean (Google Research)

Jeff joined Google in 1999 and is currently a Google Senior Fellow. He currently leads Google's Research and Health divisions, where he co-founded the Google Brain team. He has co-designed/implemented multiple generations of Google's distributed machine learning systems for neural network training and inference, as well as multiple generations of Google's crawling, indexing, and query serving systems, and major pieces of Google's initial advertising and AdSense for Content systems. He is also a co-designer and co-implementor of Google's distributed computing infrastructure, including the MapReduce, BigTable and Spanner systems, protocol buffers, LevelDB, systems infrastructure for statistical machine translation, and a variety of internal and external libraries and developer tools. He received a Ph.D. in Computer Science from the University of Washington in 1996, working with Craig Chambers on compiler techniques for object-oriented languages. He is a Fellow of the ACM, a Fellow of the AAAS, a member of the U.S. National Academy of Engineering, and a recipient of the Mark Weiser Award and the ACM Prize in Computing.

Marc'Aurelio Ranzato (DeepMind)
Tomas Mikolov (Google Research)

More from the Same Authors