Timezone: »

Challenges in Learning Hierarchical Models: Transfer Learning and Optimization
Quoc V. Le · Marc'Aurelio Ranzato · Russ Salakhutdinov · Josh Tenenbaum · Andrew Y Ng

Fri Dec 16 10:30 PM -- 11:00 AM (PST) @ Montebajo: Library

The ability to learn abstract representations that support transfer to novel but related tasks lies at the core of solving many AI related tasks, including visual object recognition, information retrieval, speech perception, and language understanding. Hierarchical models that support inferences at multiple levels have been developed and argued as among the most promising candidates for achieving such goal. An important property of these models is that they can extract complex statistical dependencies from high-dimensional sensory input and efficiently learn latent variables by re-using and combining intermediate concepts, allowing these models to generalize well across a wide variety of tasks.

In the past few years, researchers across many different communities, from applied statistics to engineering, computer science and neuroscience, have proposed several hierarchical models that are capable of extracting useful, high-level structured representations. The learned representations have been shown to give promising results for solving a multitude of novel learning tasks. A few notable examples of such models include Deep Belief Networks, Deep Boltzmann Machines, sparse coding-based methods, nonparametric and parametric hierarchical Bayesian models.

Despite recent successes, many existing hierarchical models are still far from being able to represent, identify and learn the wide variety of possible patterns and structure in real-world data. Existing models can not cope with new tasks for which they have not been specifically trained. Even when applied to related tasks, trained systems often display unstable behavior. Furthermore, massive volumes of training data (e.g., data transferred between tasks) and high-dimensional input spaces poses challenging questions on how to effectively train the deep hierarchical models. The recent availability of large scale datasets (like ImageNet for visual object recognition or Wall Street Journal for large vocabulary speech recognition), the continuous advances in optimization methods, and the availability of cluster computing have drastically changed the working scenario, calling for a re-assessment of the strengths and weaknesses of many existing optimization strategies.

The aim of this workshop is to bring together researchers working on such hierarchical models to discuss two important challenges: the ability to perform transfer learning and the best strategies to optimize these systems on large scale problems. These problems are "large" in terms of input dimensionality (in the order of millions), number of training samples (in the order of 100 millions or more) and number of categories (in the order of several tens of thousands). During the course of the workshop, we shall be interested in discussing the following topics:

1. State of the field: What are the existing methods and what is the relationship between them? Which problems can be solved using existing learning algorithms and which require fundamentally different approaches? How are current methods optimized? Which models can scale to very high-dimensional inputs, to datasets with large number of categories and with huge number of training samples? Which models best leverage large amounts of unlabeled data?

2. Learning structured representations: How can machines extract invariant representations from a large supply of high-dimensional highly-structured unlabeled data? How can these representations be used to represent and learn tens of thousands of different concepts (e.g., visual object categories) and expand on them without disrupting previously-learning concepts? How can these representations be used in multiple applications?

3. Transfer learning: How can previously-learned representations help learning new tasks so that less labeled supervision is needed? How can this facilitate knowledge representation for transfer learning tasks?

4. One-shot learning: For many traditional machine classification algorithms, learning curves are measured in tens, hundreds or thousands of training examples. For humans learners, however, just a few training examples is often sufficient to grasp a new concept. Can we develop models that are capable of efficiently leveraging previously-learned background knowledge in order to learn novel categories based on a single training example? Are there models suitable for generalizing across domains, when presented with one or few examples?

5. Scalability and success in real-world applications: How well do existing transfer learning models scale to large-scale problems including problems in computer vision, natural language processing, and speech perception? How well do these algorithms perform when applied to modeling high-dimensional real-world distributions?

6. Optimization: Which optimization methods are best for training a deep deterministic network? Which stochastic optimization algorithms are best for training a probabilistic generative models? Which optimization strategies are best to train on several thousands of categories?

7. Parallel computing: which optimization algorithm is best on GPU's. and which benefit the most by parallel computing on a cloud?

8. Theoretical Foundations: What are the theoretical guarantees of learning hierarchical models? Under what conditions is it possible to provide performance guarantees for such algorithms?

9. Suitable tasks and datasets: What are the right datasets and tasks that could be used in future research on the topic and to facilitate comparisons between methods?

In order to facilitate the discussion and to standardize results, we will invite participants to test their methods on the following two challenges.

- Transfer Learning Challenge: we will make available a dataset that has a large amount of unlabeled data and a large number of categories. The task is to categorize samples belonging to a novel category that has only few labeled training samples available. Participants will have to follow a strict training/test protocol to make results comparable. Performance is measured in terms of accuracy as well as training and test time.

- Optimization Challenge: the aim is to test several optimization algorithms to train a non-linear predictor on three large scale datasets (to perform a visual recognition task, a speech recognition task and a text categorization task). A strict protocol will be enforced to make results comparable and performance will be evaluated in terms of accuracy as well as training time both on single core machine as well as GPU and cluster.

More details: https://sites.google.com/site/nips2011workshop/

Author Information

Quoc V. Le (Google)
Marc'Aurelio Ranzato (Facebook AI Research)
Russ Salakhutdinov (Carnegie Mellon University)
Josh Tenenbaum (MIT)

Josh Tenenbaum is an Associate Professor of Computational Cognitive Science at MIT in the Department of Brain and Cognitive Sciences and the Computer Science and Artificial Intelligence Laboratory (CSAIL). He received his PhD from MIT in 1999, and was an Assistant Professor at Stanford University from 1999 to 2002. He studies learning and inference in humans and machines, with the twin goals of understanding human intelligence in computational terms and bringing computers closer to human capacities. He focuses on problems of inductive generalization from limited data -- learning concepts and word meanings, inferring causal relations or goals -- and learning abstract knowledge that supports these inductive leaps in the form of probabilistic generative models or 'intuitive theories'. He has also developed several novel machine learning methods inspired by human learning and perception, most notably Isomap, an approach to unsupervised learning of nonlinear manifolds in high-dimensional data. He has been Associate Editor for the journal Cognitive Science, has been active on program committees for the CogSci and NIPS conferences, and has co-organized a number of workshops, tutorials and summer schools in human and machine learning. Several of his papers have received outstanding paper awards or best student paper awards at the IEEE Computer Vision and Pattern Recognition (CVPR), NIPS, and Cognitive Science conferences. He is the recipient of the New Investigator Award from the Society for Mathematical Psychology (2005), the Early Investigator Award from the Society of Experimental Psychologists (2007), and the Distinguished Scientific Award for Early Career Contribution to Psychology (in the area of cognition and human learning) from the American Psychological Association (2008).

Andrew Y Ng (Baidu Research)

More from the Same Authors