Skip to yearly menu bar Skip to main content


Workshop

Computational Trade-offs in Statistical Learning

Alekh Agarwal · Sasha Rakhlin
Dec 15, 10:30 PM - 11:00 AM Montebajo: Basketball Court

Since its early days, the field of Machine Learning has focused on
developing computationally tractable algorithms with good learning
guarantees. The vast literature on statistical learning theory has led
to a good understanding of how the predictive performance of different
algorithms improves as a function of the number of training
samples. By the same token, the well-developed theories of
optimization and sampling methods have yielded efficient computational
techniques at the core of most modern learning methods. The separate
developements in these fields mean that given an algorithm we have a
sound understanding of its statistical and computational
beahvior. However, there hasn't been much joint study of the
computational and statistical complexities of learinng, as a
consequence of which, little is known about the interaction and
trade-offs between statistical accuracy and computational
complexity. Indeed a systematic joint treatment can answer some very
interesting questions: what is the best attainable statistical error
given a finite computational budget? What is the best learning method
to use given different computational constraints and desired
statistical yardsticks? Is it the case that simple methods outperform
complex ones in computationally impoverished scenarios?


At its core, the PAC framework aims to study learning through the lens
of computation. However, the thrust is on separating polynomial-time
from computationally intractable algorithms. However all
polynomial-time computations are hardly equivalent, and the difference
between linear vs quadratic dependence on problem parameters can have
a profound effect on the applicability of an algorithm. Understanding
the trade-offs between statistical accuracy and computational demands
in this situation is of paramount importance.


The need for such a theory is more compelling now than ever before
since we routinely face training corpuses with billions of examples,
and often, an even larger number of parameters to be estimated. The
emergence of web and mechanical turk as sources of training data often
stretches learning algorithms to the point that the bottleneck is no
longer the number of examples, but the amount of computation available
to process the examples. A theory to principally choose from a
multitude of learning methods based on the properties of training
examples as well as the computational resources available would be of
clear interest. Another way to pose the same problem would be to
design algorithms that can take as input a computational constraint
and try to learn the best hypothesis they can based on the available
budget and data.

There have been some works that try to address different facets of the above problem. Researchers working on massive datasets in the CS theory community look at streaming methods that aim to impose constraints on both the computational and storage requirements of the algorithms. Online learning presents one particular way of dealing with a computational budget, by processing as many samples as possible with the computational budget.

There have been some more relevant works in the machine learning community in the last few years. Bottou and Bousquet (2008)
compare the amount of computation needed to attain a certain
statistical error for a few routinely used optimization
algorithms. Shalev-Shwartz and Srebro (2009) show how stochastic gradient descent applied to SVM optimization can experience an inverse dependence on number of training sample in the regime of large
datasets. In some more recent works, Shalev-Shwartz and co-authors have also used cryptographic conjectures to establish the computational hardness of certain learning problems. On the algorithmic front, coarse-to-fine learning provides a nice framework to systematically incorporate computational considerations, using computational cost as a regularization term in the objective of the learning method. Other budgeted algorithms such as budgeted SVMs and budgeted perceptrons try to admit hard budget constraints on the running time and storage of the algorithm.






The goals of our workshop are:


* To draw the attention of machine learning researchers to this rich and emerging area of problems and to establish a community of researchers that are interested in understanding these tradeoffs.

* To define a number of common problems in this area and to encourage future research that is comparable and compatible.

* To expose the learning community to relevant work in fields such as CS theory and convex optimization.



We will call for papers on the following topics:


* Fundamental statistical limits with bounded computation

Trade-offs between statistical accuracy and computational costs

Algorithms to learn under budget constraints

* Budget constraints on other resources (like bounded memory)

* Computationally aware approaches such as coarse-to-fine learning

Show more
View full details
Workshop

Beyond Mahalanobis: Supervised Large-Scale Learning of Similarity

Greg Shakhnarovich · Dhruv Batra · Brian Kulis · Kilian Q Weinberger
Dec 15, 10:30 PM - 11:00 AM Melia Sierra Nevada: Guejar

he notion of similarity (or distance) is central in many problems in machine learning: information retrieval, nearest-neighbor based prediction, visualization of high-dimensional data, etc. Historically, similarity was estimated via a fixed distance function (typically Euclidean), sometimes engineered by hand using domain knowledge. Using statistical learning methods instead to learn similarity functions is appealing, and over the last decade this problem has attracted much attention in the community with several publications in NIPS, ICML, AISTATS, CVPR etc.

Much of this work, however, has focused on a specific, restricted approach: learning a Mahalanobis distance, under a variety of objectives and constraints. This effectively limits the setup to learning a linear embedding of the data.

In this workshop, we will look beyond this setup, and consider methods that learn non-linear embeddings of the data, either explicitly via non-linear mappings or implicitly via kernels. We will especially encourage discussion of methods that are suitable for large-scale problems increasingly facing practitioner of learning methods: large number of examples, high dimensionality of the original space, and/or massively multi-class problems (e.g. Classification with 10,000+ categories, 10,000,000 image of ImageNet dataset).

Our goals are to


1. Create a comprehensive understanding of the state-of-the-art in similarity learning, via presentation of recent work,
2. Initiate an in-depth discussion on major open questions brought up by research in this area. Among these questions:

* Are there gains to be made from introducing non-linearity into similarity models?
* When the underlying task is prediction (classification or regression) are similarity functions worth learning, instead of attacking the prediction task directly? A closely related question - when is it beneficial to use nearest neighbor based methods, with learned similarity?
* What is the right loss (or objective) function to minimize in similarity learning?
* It is often claimed that inherent structure in real data (e.g. low-dimensional manifolds) makes learning easier. How, if at all, does this affect similarity learning?
* What are similarities/distinctions between learning similarity functions and learning hashing?
* What is the relationship between unsupervised similarity learning (often framed as dimensionality reduction) and the supervised similarity learning?
* Are there models of learning nonlinear similarities for which bounds (e.g., generalization error, regret bounds) can be proven?
* What algorithmic techniques must be employed or developed to scale nonlinear similarity learning to extremely large data sets?



We will encourage the invited speakers to address these questions in their talks, and will steer the panel discussion towards some of these.

Target audience of this workshop consists of two (overlapping) groups:
-- practitioners of machine learning who deal with large scale problems where the ability to more accurately predict similarity values is important, and
-- core machine learning researchers working on learning similarity/distance/metric and on similarity-based prediction methods.

Show more
View full details
Workshop

Relations between machine learning problems - an approach to unify the field

Robert Williamson · John Langford · Ulrike von Luxburg · Mark Reid · Jennifer Wortman Vaughan
Dec 15, 10:30 PM - 11:00 AM Melia Sierra Nevada: Dilar

What:
The workshop proposes to focus on relations between machine learning problems. We use “relation” quite generally to include (but not limit ourselves to) notions such as: one type of problem being viewed special case of another type (e.g., classification as thresholded probability estimation); reductions between learning problems (e.g., transforming ranking problems into classification problems); and the use of surrogate losses (e.g., replacing misclassification loss with some other, convex loss). We also include relations between sets of learning problems, such as those studied in the (old) theory of “comparison of experiments”, as well as recent connections between machine learning problems and what could be construed as "economic learning problems" such as prediction markets and forecast elicitation.


Why: The point of studying relations between machine learning problems is that it stands a reasonable chance of being a way to be able to understand the field of machine learning as a whole. It could serve to prevent re-invention, and rapidly facilitate the growth of new methods. The motivation is not dissimilar to Hal Varian’s notion of combinatorial innovation. Another analogy is to consider the development of function theory in the 19th century and observe the rapid advances made possible by the development of functional analysis, which, rather than studying individual functions, studied operators that transformed one function to another.

Much recent work in machine learning can be interpreted as relations between problems. For example:
• Surrogate regret bounds (bound the performance attained for one learning problem in terms of that obtained for another) [Bartlett et al, 2007]
• Relationships between binary classification problems and distances between probability distributions [Reid and Williamson 2011]
• Reductions from class probability estimation to classification, or reinforcement learning to classification [Langford et al; 2005-]
More recently there have been connections to problems that do not even seem to be about machine learning, such as the equivalence between
• Cost-function based prediction markets and no-regret learning [Chen and Wortman-Vaughan 2010]
• Elicitability of properties of distributions and proper losses [Lambert 2011]

In fact some older work in machine learning can be viewed as relations between problems:
• Learning with real-valued functions in the presence of noise can be reduced to multiclass classification [Bartlett, Long & Williamson 1996]
• Comparison of Experiments [Blackwell 1955]


If one attempts to construct a catalogue of machine learning problems at present one is rapidly overwhelmed by the complexity. And it is not at all clear (on the basis of the usual description of them) whether or not two problems with different names are really different. (If the reader is unconvinced, consider the following partial list: batch, online, transductive, off-training set, semi-supervised, noisy (label, attribute, constant noise / variable noise, data of variable quality), data of different costs, weighted loss functions, active, distributed, classification (binary weighted binary multi-class), structured output, probabilistic concepts / scoring rules, class probability estimation, learning with statistical queries, Neyman-Pearson classification, regression, ordinal regression, ranked regression, ranking, ranking the best, optimising the ROC curve, optimising the AUC, regression, selection, novelty detection, multi-instance learning, minimum volume sets, density level sets, regression level sets, sets of quantiles, quantile regression, density estimation, data segmentation, clustering, co-training, co-validation, learning with constraints, conditional estimators, estimated loss, confidence / hedging estimators, hypothesis testing, distributional distance estimation, learning relations, learning total orders, learning causal relationships, and estimating performance (cross validation)!


Specific topics: We would solicit contributions on novel relations between machine learning problems, as well as theoretical and practical frameworks to construct such relations. We are not restricting the workshop to pure theory, although it seems natural the workshop will have a theoretical bent.

Who: We believe the workshop will be of considerable interest to theoretically inclined machine learning researchers, as it offers a new view as to how to situate one’s work. Furthermore we also believe it should be of interest to practitioners because being able to relate a new problem to an old one can save substantial work in having to construct a new solution.

Outcomes:
• New relations between learning problems – not individual solutions to individual problems
• Visibility and promulgation of the “meme” of relating problems;
• We believe the nature of the workshop would suit the publication of workshop proceedings.
• Potential agreement to a shared community effort to build a comprehensive map of the relations between machine learning problems.

Show more
View full details
Workshop

Decision Making with Multiple Imperfect Decision Makers

Tatiana V. Guy · Miroslav Karny · David H Wolpert · Alessandro VILLA · David Rios Insua
Dec 15, 10:30 PM - 11:00 AM Melia Sol y Nieve: Snow

NIPS 2011 Workshop Proposal
Title: Decision Making with Multiple Imperfect Decision Makers
Date: December, 2011
Organisers:
Tatiana V. Guy Miroslav Kárný, Institute of Information Theory and Automation, Czech Republic
David Wolpert, NASA Ames Research Center, USA
David Rios Insua, Royal Academy of Sciences, Spain
Alessandro E.P. Villa, University of Lausanne, Switzerland

OVERVIEW
The prescriptive Bayesian theory of dynamic decision making under uncertainty and incomplete knowledge has reached a high level of maturity. It is well-supported by efficient and theoretically justified algorithms respecting different physical constraints present in applications. The research repeatedly stresses the influence of imperfectness, i.e. limited cognitive and evaluative resources of decision makers that should be considered. Decision making with imperfect decision makers, however, lacks a firm prescriptive ground. This problem emerges repeatedly and seems of difficult solution. For instance, i) the consistent theory of incomplete Bayesian games cannot be applied by imperfect participants; ii) a desirable incorporation of “deliberation effort” into the design of decision-making strategies remains unsolved. At the same time, real societal, biological, economical and engineered systems practically cope with the imperfectness and many descriptive studies confirm their efficiency. The need to understand and remove this discrepancy motivated the preceding NIPS 2010 workshop Decision Making with Multiple Imperfect Decision Makers. It intended to exploit descriptive analysis of properties of interacting imperfect decision makers in order to break the barrier faced by the prescriptive theory and to enable its ubiquitous use. The preceding workshop has opened several new directions towards this ambitious goal and stimulated an information exchange on available results and emerging ideas. The proposed continuation, NIPS 2011 workshop, will keep focusing on the outlined direction while will encourage the ideas and discussions of bounded-rationality and the imperfection of decision makers.
The NIPS 2011 workshop will examine ways of: a) formalising rational decision making of imperfect decision makers; b) creating a prescriptive theory, which respects imperfect decision makers; c) extending the existing feasible prescriptive theories to make them useful for imperfect decision makers; d) recognising the key features allowing real systems to cope with the imperfectness; e) generalising conceptual and algorithmic approaches coping with imperfectness; f) opening new paradigms in addressing the issue of imperfectness.
The workshop aims to bring together different scientific communities, to brainstorm possible research directions, and to encourage collaboration among researchers with complementary ideas and expertise. The workshop will be based on invited talks, contributed talks and posters. Extensive moderated and informal discussions will ensure the targeted information exchange.

Call for Contributions
The workshop will include talks with discussions and poster sessions. We invite participants to submit draft papers describing the technical content of proposed contribution. The selected submission may be accepted either as an oral presentation or as a poster presentation. We especially encourage participants who can contribute in the following areas:
•Formalisation of rational decision making of interacting imperfect decision makers
•Ways leading to a feasible prescriptive theory supporting imperfect decision makers
•Case studies and lessons learnt from nature, technology and society with respect to previous items
•Ideas, tricks and algorithms suitable for a systematic support of imperfect decision makers


Accepted papers will be made available online at the workshop website http://www.utia.cz/NIPSHome. Selected authors will be invited to prepare a full version of the paper to be published in Workshop proceedings either in NIPS Series or similarly as in 2010 (Decision Making with Multiple Imperfect Decision Makers, Eds TV Guy, M Kárný, DH Wolpert, Springer-Verlag GmbH Berlin/Heidelberg)

Targeted Audience
- NIPS community
- Scientists and students from the different scientific communities (decision science, cognitive science, natural science, social science, engineering, etc.) interested in various aspects of decision making, especially with multiple decision makers

Show more
View full details
Workshop

From statistical genetics to predictive models in personalized medicine

Karsten Borgwardt · Oliver Stegle · Shipeng Yu · Glenn Fung · Faisal Farooq · Balaji R Krishnapuram
Dec 15, 10:30 PM - 11:00 AM Melia Sol y Nieve: Slalom

Background

Technological advances to profile medical patients have led to a change of paradigm in medical prognoses. Medical diagnostics carried out by medical experts is increasingly complemented by large-scale data collection and quantitative genome-scale molecular measurements. Data that are already available as of today or are to enter medical practice in the near future include personal medical records, genotype information, diagnostic tests, proteomics and other emerging ‘omics’ data types.

This rich source of information forms the basis of future medicine and personalized medicine in particular. Predictive methods for personalized medicine allow to integrate these data specific for each patient (genetics, exams, demographics, imaging, lab, genomic etc.), both for improved prognosis and to design an individual-specific optimal therapy.

However, the statistical and computational approaches behind these analyses are faced with a number of major challenges. For example, it is necessary to identify and correcting for structured influences within the data; dealing with missing data and the statistical challenges that come along with carrying out millions of statistical tests. Also, to render these methods useful in practice computational efficiency and scalability to large-scale datasets are an integral requirement. Finally, any computational approach needs to be tightly integrated with medical practice to be actually used and the experiences gained need to be fed back into future development and improvements.

To both address these technical difficulties ahead and to allow for an efficient integration and application in a medical context, it is necessary to bring the communities of statistical method developers, medics and biological investigators together.

Purpose of Workshop

The purpose of this 2nd cross-discipline workshop is to bring together machine learning, statistical genetics and healthcare researchers interested in problems and applications of predictive models in the field of personalized medicine. The goal of the workshop will be to bridge the gap between the theory of predictive models and statistical genetics with respect to medical applications and the pressing needs of the healthcare community. The workshop will promote an exchange of ideas, helping to identify important and challenging applications as well as the discovery of possible synergies. Ideally, we hope that such discussion will lead to interdisciplinary collaborations with resulting collaborative grant submissions. The emphasis will be on the statistical and engineering aspects of predictive models and how it relates to practical medical and biological problems.

Although related in a broad sense, the workshop does not directly overlap with the fields of Bioinformatics and Biostatistics. While predictive modeling for healthcare has been explored by biostatisticians for several decades, the focus of this workshop is on substantially different needs and problems that are better addressed by modern machine learning technologies. For example, how should we organize clinical trials to validate the clinical utility of predictive models for personalized therapy selection? How can we integrate and combine heterogenious data while accounting for confounding influences? How can we ensure computational efficiency that render these methods useful in practice?

The focus of this workshop will be methods to address these and related questions. The focus is not on questions of basic science; rather, we will focus on predictive models that combine available patient data while resolving the technical and statistical challenges through modern machine learning.

The workshop program will combine presentations by invited speakers from both machine learning, statistical genetics and personalized medicine fields and by authors of extended abstracts submitted to the workshop. In addition, we will reserve sufficient room for discussion both in the forms of an open panel as well as in the context of poster presentations.

Target Participants

The target participants are researchers interested in predictive models for:

· Preventive medicine
· Therapy Selection
. Statistical Genetics
. Medical Genetics
· Precision diagnostics (more precise diagnostics, diseases sub-typing)
· Companion Diagnostics/Therapeutics
· Patient risk assessment (for incidence of diseases)
· Personalized Medicine
· Integrated diagnostics combining multiple modalities like imaging, genomics and in-vitro diagnostics

Show more
View full details
Workshop

Bayesian optimization, experimental design and bandits: Theory and applications

Nando de Freitas · Roman Garnett · Frank R Hutter · Michael A Osborne
Dec 15, 10:30 PM - 11:00 AM Melia Sierra Nevada: Hotel Bar
Recently, we have witnessed many important advances in learning approaches for sequential decision making. These advances have occurred in different communities, who refer to the problem using different terminology: Bayesian optimization, experimental design, bandits ($$x$$-armed bandits, contextual bandits, Gaussian process bandits), active sensing, personalized recommender systems, automatic algorithm configuration, reinforcement learning and so on. These communities tend to use different methodologies too. Some focus more on practical performance while others are more concerned with theoretical aspects of the problem. As a result, they have derived and engineered a diverse range of methods for trading off exploration and exploitation in learning. For these reasons, it is timely and important to bring these communities together to identify differences and commonalities, to propose common benchmarks, to review the many practical applications (interactive user interfaces, automatic tuning of parameters and architectures, robotics, recommender systems, active vision, and more), to narrow the gap between theory and practice and to identify strategies for attacking high dimensionality.
Show more
View full details
Workshop

Cosmology meets Machine Learning

Michael Hirsch · Sarah Bridle · Bernhard Schölkopf · Phil Marshall · Stefan Harmeling · Mark Girolami
Dec 15, 10:30 PM - 11:00 AM Melia Sierra Nevada: Monachil

Cosmology aims at the understanding of the universe and its evolution through scientific observation and experiment and hence addresses one of the most profound questions of human mankind. With the establishment of robotic telescopes and wide sky surveys cosmology already now faces the challenge of evaluating vast amount of data. Multiple projects will image large fractions of the sky in the next decade, for example the Dark Energy Survey will culminate in a catalogue of 300 million objects extracted from peta-bytes of observational data. The importance of automatic data evaluation and analysis tools for the success of these surveys is undisputed.

Many problems in modern cosmological data analysis are tightly related to fundamental problems in machine learning, such as classifying stars and galaxies, and cluster finding of dense galaxy populations. Other typical problems include data reduction, probability density estimation, how to deal with missing data and how to combine data from different surveys.

An increasing part of modern cosmology aims at the development of new statistical data analysis tools and the study of their behaviour and systematics often not aware of recent developments in machine learning and computational statistics.

Therefore, the objectives of this workshop are two-fold:

(i) The workshop aims to bring together experts from the Machine Learning and Computational Statistics community with experts in the field of cosmology to promote, discuss and explore the use of machine learning techniques in data analysis problems in cosmology and to advance the state of the art.

(ii) By presenting current approaches, their possible limitations, and open data analysis problems in cosmology to the NIPS community, this workshop aims to encourage scientific exchange and to foster collaborations among the workshop participants.

The workshop is proposed as a one-day workshop organised jointly by experts in the field of empirical inference and cosmology. The target group of participants are researchers working in the field of cosmological data analysis as well as researchers from the whole NIPS community sharing the interest in real-world applications in a fascinating, fast-progressing field of fundamental research. Due to the mixed participation of computer scientists and cosmologists the invited speakers will be asked to give talks with tutorial character and make the covered material accessible for both computer scientists and cosmologists.

Show more
View full details
Workshop

Copulas in Machine Learning

Gal Elidan · Zoubin Ghahramani · John Lafferty
Dec 15, 10:30 PM - 11:00 AM Melia Sierra Nevada: Genil

From high-throughput biology and astronomy to voice analysis and medical diagnosis, a wide variety of complex domains are inherently continuous and high dimensional. The statistical framework of copulas offers a flexible tool for modeling highly non-linear multivariate distributions for continuous data. Copulas are a theoretically and practically important tool from statistics that explicitly allow one to separate the dependency structure between random variables from their marginal distributions. Although bivariate copulas are a widely used tool in finance, and have even been famously accused of "bringing the world financial system to its knees" (Wired Magazine, Feb. 23, 2009), the use of copulas for high dimensional data is in its infancy.

While studied in statistics for many years, copulas have only recently been noticed by a number of machine learning researchers, with this "new" tool appearing in the recent leading machine learning conferences (ICML, UAI and NIPS). The goal of this workshop is to promote the further understanding and development of copulas for the kinds of complex modeling tasks that are the focus of machine learning. Specifically, the goals of the workshop are to:

* draw the attention of machine learning researchers to the
important framework of copulas

* provide a theoretical and practical introduction to copulas

* identify promising research problems in machine learning that
could exploit copulas

* bring together researchers from the statistics and machine learning communities working in this area.

The target audience includes leading researchers from academia and industry, with the aim of facilitating cross fertilization between
different perspectives.

Show more
View full details
Workshop

Machine Learning and Interpretation in Neuroimaging (MLINI-2011)

Melissa K Carroll · Guillermo Cecchi · Kai-min K Chang · Moritz Grosse-Wentrup · James Haxby · Georg Langs · Anna Korhonen · Bjoern Menze · Brian Murphy · Janaina Mourao-Miranda · Vittorio Murino · Francisco Pereira · Irina Rish · Mert Sabuncu · Irina Simanova · Bertrand Thirion
Dec 15, 10:30 PM - 11:00 AM Melia Sol y Nieve: Aqua

https://sites.google.com/site/mlini2011/

SUBMISSION DEADLINE: October 17, 2011

Primary contacts:

* Moritz Grosse-Wentrup moritzgw@ieee.org
* Georg Langs langs@csail.mit.edu
* Brian Murphy brian.murphy@unitn.it
* Irina Rish rish@us.ibm.com


MOTIVATION:

Modern multivariate statistical methods have been increasingly applied to various problems in neuroimaging, including “mind reading”, “brain mapping”, clinical diagnosis and prognosis. Multivariate pattern analysis (MVPA) is a promising machine-learning approach for discovering complex relationships between high-dimensional signals (e.g., brain images) and variables of interest (e.g., external stimuli and/or brain's cognitive states). Modern multivariate regularization approaches can overcome the curse of dimensionality and produce highly predictive models even in high-dimensional, small-sample scenarios typical in neuroimaging (e.g., 10 to 100 thousands of voxels and just a few hundreds of samples).

However, despite the rapidly growing number of neuroimaging applications in machine learning, its impact on how theories of brain function are construed has received little consideration. Accordingly, machine-learning techniques are frequently met with skepticism in the domain of cognitive neuroscience. In this workshop, we intend to investigate the implications that follow from adopting machine-learning methods for studying brain function. In particular, this concerns the question how these methods may be used to represent cognitive states, and what ramifications this has for consequent theories of cognition. Besides providing a rationale for the use of machine-learning methods in studying brain function, a further goal of this workshop is to identify shortcomings of state-of-the-art approaches and initiate research efforts that increase the impact of machine learning on cognitive neuroscience.


Decoding higher cognition and interpreting the behaviour of associated classifiers can pose unique challenges, as these psychological states are complex, fast-changing and often ill-defined. For instance, speech is received at 3-4 words a second; acoustic, semantic and syntactic processing occur in parallel; and the form of underlying representations (sentence structures, conceptual descriptions) remains controversial. ML techniques are required that can take advantage of patterns that are temporally and spatially distributed, but coordinated in their activity. And different recording modalities have distinctive advantages: fMRI provides millimetre-level localisation in the brain but poor temporal resolution, while EEG and MEG have millisecond temporal resolution at the cost of spatial resolution. Ideally machine learning methods would be able to meaningfully combine complementary information from these different neuroimaging techniques, and reveal latent dimensions in neural activity, while still being capable of disentangling tightly linked and confounded sub-processes.

Moreover, from the machine learning perspective, neuroimaging is a rich source of challenging problems that can facilitate development of novel approaches. For example, feature extraction and feature selection approaches become particularly important in neuroimaging, since the primary objective is to gain a scientific insight rather than simply learn a ``black-box'' predictor. However, unlike some other applications where the set features might be quite well-explored and established by now, neuroimaging is a domain where a machine-learning researcher cannot simply "ask domain experts what features should be used", since this is essentially the question domain experts themselves are trying to figure out. While the current neuroscientific knowledge can guide the definition of specialized 'brain areas', more complex patterns of brain activity, such as spatio-temporal patterns, functional network patterns, and other multivariate dependencies remain to be discovered mainly via statistical analysis.


Open questions

The list of open questions of interest to the workshop includes, but is not limited to the following:

* How can we interpret results of multivariate models in a neuroscientific context?
* How suitable are MVPA and inference methods for brain mapping?
* How can we assess the specificity and sensitivity?
* What is the role of decoding vs. embedded or separate feature selection?
* How can we use these approaches for a flexible and useful representation of neuroimaging data?
* What can we accomplish with generative vs. discriminative modelling?
* How can ML techniques help us in modeling higher cognitive processes (e.g. reasoning, communication, knowledge representation)?
* How can we disentangle confounded processes and representations?
* How do we combine the data from different recording modalities (e.g. fMRI, EEG, structural MRI, DTI, MEG, NIRS, EcOG, single cell recordings, etc.)?

This workshop is part of the PASCAL2 Thematic Programme on Cognitive Inference and Neuroimaging (http://mlin.kyb.tuebingen.mpg.de/).

Show more
View full details
Workshop

Big Learning: Algorithms, Systems, and Tools for Learning at Scale

Joseph E Gonzalez · Sameer Singh · Graham Taylor · James Bergstra · Alice Zheng · Misha Bilenko · Yucheng Low · Yoshua Bengio · Michael Franklin · Carlos Guestrin · Andrew McCallum · Alexander Smola · Michael Jordan · Sugato Basu
Dec 15, 10:30 PM - 11:00 AM Montebajo: Theater

Driven by cheap commodity storage, fast data networks, rich structured models, and the increasing desire to catalog and share our collective experiences in real-time, the scale of many important learning problems has grown well beyond the capacity of traditional sequential systems. These “Big Learning” problems arise in many domains including bioinformatics, astronomy, recommendation systems, social networks, computer vision, web search and online advertising. Simultaneously, parallelism has emerged as a dominant widely used computational paradigm in devices ranging from energy efficient mobile processors, to desktop supercomputers in the form of GPUs, to massively scalable cloud computing services. The Big Learning setting has attracted intense interest across industry and academia, with active research spanning diverse fields ranging from machine learning and databases to large scale distributed systems and programming languages. However because the Big Learning setting is being studied by experts of these various communities, there is a need for a common venue to discuss recent progress, to identify pressing new challenges, and to exchange new ideas.


This workshop aims to:

* Bring together parallel and distributed system builders in industry and academia, machine learning experts, and end users to identify the key challenges, opportunities, and myths of Big Learning. What REALLY changes from the traditional learning setting when faced with terabytes or petabytes of data?
* Solicit practical case studies, demos, benchmarks and lessons-learned presentations, and position papers.
* Showcase recent and ongoing progress towards parallel ML algorithms
* Provide a forum for exchange regarding tools, software, and systems that address the Big Learning problem.
* Educate the researchers and practitioners across communities on state-of-the-art solutions and their limitations, particularly focusing on key criteria for selecting task- and domain-appropriate platforms and algorithms.


Focal points for discussions and solicited submissions include but are not limited to:

1. Case studies of practical applications that operate on large data sets or computationally intensive models; typical data and workflow patterns; machine learning challenges and lessons learned.
2. Insights about the end users for large-scale learning: who are they, what are their needs, what expertise is required of them?
3. Common data characteristics: is it more typical for data to appear in streams or in batches? What are the applications that demand online or real-time learning, and how can the engineering challenges for deploying autonomously adaptive systems be overcome? Which analytic and learning problems are more appropriate for (or even require) analysis in the cloud, and when is “desktop” learning on sub-sampled or compressed data sufficient?
4. Choices in data storage and management, e.g., trade-offs between classical RDBMS and NoSQL platforms from a data analysis and machine learning perspectives.
5. The feasibility of alternate structured data storage: object databases, graph databases, and streams.
6. Suitability of different distributed system platforms and programming paradigms: Hadoop, DryadLINQ, EC2, Azure, etc.
7. Applicability of different learning and analysis techniques: prediction models that require large-scale training, vs. simpler data analysis (e.g., summary statistics), which is needed when.
8. Computationally intensive learning and inference: Big Learning doesn’t just mean 9. Big Data it also can mean massive models or structured prediction tasks.
Labeling and supervision: scenarios for large-scale label availability and appropriate learning approaches. Making use of diverse labeling strategies (curated vs. noisy/crowd-sourced/feedback-based labeling)
10. Real-world deployment issues: initial prototyping requires quickly-implemented-and-expandable solutions, along with the ability to easily incorporate new features/data sources.
11. Practicality of high-performance hardware for large-scale learning (e.g., GPUs, FPGAs, ASIC). GPU vs. CPU processors: programming strategies and performance opportunities and tradeoffs.
12. Unifying the disparate data structures and software libraries that have emerged in the GP-GPU community.
13. Evaluation methodology and trade-offs between machine learning metrics (predictive accuracy), computational performance (throughput, latency, speedup), and engineering complexity and cost.
14. Principled methods for dealing with huge numbers of features. As the number of data points grow, often times so do the number of features as well as their dependence structure. Does Big Learning require, for example, better ways of doing multiple hypothesis testing than FDR?
15. Determination of when is an answer good enough. How can we efficiently estimate confidence intervals over Big Data?


Target audience includes industry and academic researchers from the various subfields relevant to large-scale machine learning, with a strong bias for either position talks that aim to induce discussion, or accessible overviews of the state-of-the-art. We will solicit paper submissions in the form of short, long and position papers as well as demo proposals. Papers that focus on emerging applications or deployment case studies will be particularly encouraged, while demos of operational toolkits and platforms will be considered for inclusion in the primary program of the workshop.

Show more
View full details
Workshop

Deep Learning and Unsupervised Feature Learning

Yoshua Bengio · Adam Coates · Yann Lecun · Nicolas Le Roux · Andrew Y Ng
Dec 15, 10:30 PM - 11:00 AM Telecabina: Movie Theater

In recent years, there has been a lot of interest in algorithms that learn feature hierarchies from unlabeled data. Deep learning methods such as deep belief networks, sparse coding-based methods, convolutional networks, and deep Boltzmann machines, have shown promise and have already been successfully applied to a variety of tasks in computer vision, audio processing, natural language processing, information retrieval, and robotics. In this workshop, we will bring together researchers who are interested in deep learning and unsupervised feature learning, review the recent technical progress, discuss the challenges, and identify promising future research directions.

Through invited talks, panels and discussions (see program schedule), we will attempt to address some of the more controversial topics in deep learning today, such as whether hierarchical systems are more powerful, the issues of scalability of deep learning, and what principles should guide the design of objective functions used to train these models.

The workshop will also invite paper submissions on the development of unsupervised feature learning and deep learning algorithms, theoretical foundations, inference and optimization, semi-supervised and transfer learning, and applications of deep learning and unsupervised feature learning to real-world tasks. Papers will be presented as oral or poster presentations (with a short spotlight presentation).

The workshop will also have two panel discussion sessions. The main topics of discussion will include:

* Whether/why hierarchical systems are really needed
* How to build hierarchical systems: advantages and disadvantages of bottom-up vs. top-down paradigm.
* Principles underlying learning of hierarchical systems: sparsity, reconstruction, (if supervised) what kind of supervision, how to learn invariances, etc.
* Issues of scalability of unsupervised feature learning and deep learning systems
* Major milestones and goals for the next 5 or 10 years
* Critiques of deep learning
* Real-world applications: what are challenging tasks and datasets?
* Relation to neuroscience: Can or should we design models that are more closely inspired by biological systems? Can we explain neural coding?

Panel discussions will be led by the members of the organizing committee as well as by prominent researchers from related fields.

The goal of this workshop is two-fold. First, we want to identify the next big challenges and propose research directions for the deep learning community. Second, we want to bridge the gap between researchers working on different (but related) fields, to leverage their expertise, and to encourage the exchange of ideas with all the other members of the NIPS community.


The proposed workshop builds on and extends the very successful Deep Learning and Unsupervised Feature Learning workshop held at NIPS 2010, which had over 150 attendees and received 30 research paper submissions.


The tentative timeline is (might be revised depending on the timing of notification of workshop acceptance):


August 30: Call for papers released
October 21: Paper submissions due
October 21 - November 7: Reviewing period
November 11: Notification of acceptance or rejection
December 1: Final version of papers due (for online proceedings)
December 16 or 17: Workshop*

* If possible, we'd prefer a Friday workshop date, which would allow us to organize a dinner for the attendees; but either day is fine.

Show more
View full details
Workshop

Optimization for Machine Learning

Suvrit Sra · Stephen Wright · Sebastian Nowozin
Dec 15, 10:30 PM - 11:00 AM Melia Sierra Nevada: Dauro

Dear NIPS Workshop Chairs,

We propose to organize the workshop

OPT2011 "Optimization for Machine Learning."


This workshop builds on precedent established by our previously very well-received NIPS workshops, OPT2008--OPT2010 (Urls are cited in the last box)

The OPT workshops enjoyed packed (to overpacked) attendance---and this enthusiastic reception underscores the strong interest, relevance, and importance enjoyed by optimization in the ML community.

This continued interest in optimization is readily acknowledged, because optimization lies at the heart of ML algorithms. Sometimes, classical textbook algorithms suffice, but the majority problems require tailored methods that
are based on a deeper understanding of the ML requirements. In fact, ML applications and researchers are driving some of the most cutting-edge developments in optimization today. The intimate relation of optimization with ML is the key motivation for our workshop, which aims to foster discussion,
discovery, and dissemination of the state-of-the-art in optimization.

FURTHER DETAILS
--------------------------------
Optimization is indispensable to many machine learning algorithms. What can we say beyond this obvious realization?

Previous talks at the OPT workshops have covered frameworks for convex programs (D. Bertsekas), the intersection of ML and optimization, especially in the area of SVM training (S. Wright), large-scale learning via stochastic
gradient methods and its tradeoffs (L. Bottou, N. Srebro), exploitation of structured sparsity in optimization (Vandenberghe), randomized methods for extremely large-scale convex optimization (A. Nemirovski), and complexity theoretic foundations of convex optimization (Y. Nesterov), among others.

Several important realizations were brought to the fore by these talks, and many of the dominant ideas will appear in our forthcoming book: "Optimization for Machine learning" (MIT Press, 2011).

Much interest has focused recently on stochastic methods, which can be used in an online setting and in settings where data sets are extremely large and high accuracy is not required. Many aspects of stochastic gradient remain to be
explored, for example, different algorithmic variants, customizing to the data set structure, convergence analysis, sampling techniques, software, choice of regularization and tradeoff parameters, distributed and parallel computation. The need for an up-to-date analysis of algorithms for nonconvex
problems remains an important practical issue, whose importance becomes even more pronounced as ML tackles more and more complex mathematical models.

Finally, we do not wish to ignore the "not particularly large scale" setting, where one does have time to wield substantial computational resources. In this setting, high-accuracy solutions and deep understanding of the lessons contained in the data are needed. Examples valuable to MLers may be
exploration of genetic and environmental data to identify risk factors for disease; or problems dealing with setups where the amount of observed data is not huge, but the mathematical model is complex.


PRELIMINARY CFP (which will be circulated) FOLLOWS

------------------------------------------------------------------------------
OPT 2011
(proposed) NIPS Workshop on Optimization for Machine Learning
NIPS2011 Workshop
URL: http://opt.kyb.tuebingen.mpg.de/index.html
------------------------------------------------------------------------------


Abstract
--------

Optimization is a well-established, mature discipline. But the way we use this discipline is undergoing a rapid transformation: the advent of modern data intensive applications in statistics, scientific computing, or data mining and machine learning, is forcing us to drop theoretically powerful methods in favor of simpler but more scalable ones. This changeover exhibits itself most starkly in machine learning, where we have to often process massive datasets;
this necessitates not only reliance on large-scale optimization techniques, but also the need to develop methods "tuned" to the specific needs of machine learning problems.


Background and Objectives
-------------------------

We build on OPT2008, OPT2009, and OPT2010---the forerunners of this workshop. All three workshops happened as a part of NIPS. Beyond this major precedent, there have been other related workshops such as the "Mathematical
Programming in Machine Learning / Data Mining" series (2005 to 2007) and the BigML NIPS 2007 workshop.

Our workshop has the following major aims:

* Provide a platform for increasing the interaction between researchers from optimization, operations research, statistics, scientific computing, and machine learning;
* Identify key problems and challenges that lie at the intersection of optimization and ML;
* Narrow the gap between optimization and ML, to help reduce rediscovery, and thereby accelerate new advances.


Call for Participation
----------------------

This year we invite two types of submissions to the workshop:

(i) contributed talks and/or posters
(ii) open problems

For the latter, we request the authors to prepare a few slides that clearly
present, motivate, and explain an important open problem --- the main aim here
is to foster active discussion. Our call for open problems is modeled after a
similar session that takes place at COLT. The topics of interest for the open
problem session are the same as those for regular submissions; please see
below for details.

In addition to open problems, we invite high quality submissions for
presentation as talks or poster presentations during the workshop. We are
especially interested in participants who can contribute theory / algorithms,
applications, or implementations with a machine learning focus on the
following topics:

Topics
------

* Stochastic, Parallel and Online Optimization,
- Large-scale learning, massive data sets
- Distributed algorithms
- Optimization on massively parallel architectures
- Optimization using GPUs, Streaming algorithms
- Decomposition for large-scale, message-passing and online learning
- Stochastic approximation
- Randomized algorithms

* Algorithms and Techniques (application oriented)
- Global and Lipschitz optimization
- Algorithms for non-smooth optimization
- Linear and higher-order relaxations
- Polyhedral combinatorics applications to ML problems

* Nonconvex Optimization,
- Nonconvex quadratic programming, including binary QPs
- Convex Concave Decompositions, D.C. Programming, EM
- Training of deep architectures and large hidden variable models
- Approximation Algorithms
- Nonconvex, nonsmooth optimization

* Optimization with Sparsity constraints
- Combinatorial methods for L0 norm minimization
- L1, Lasso, Group Lasso, sparse PCA, sparse Gaussians
- Rank minimization methods
- Feature and subspace selection

* Combinatorial Optimization
- Optimization in Graphical Models
- Structure learning
- MAP estimation in continuous and discrete random fields
- Clustering and graph-partitioning
- Semi-supervised and multiple-instance learning


Important Dates
---------------

* Deadline for submission of papers: 21st October 2011
* Notification of acceptance: 12th November 2011
* Final version of submission: 24th November 2011


Please note that at least one author of each accepted paper must be available
to present the paper at the workshop. Further details regarding the
submission process are available at the workshop homepage.

Workshop
--------
The workshop will be a one-day event with a morning and afternoon session. In
addition to a lunch break, long coffee breaks will be offered both in the
morning and afternoon.


A new session on open problems is proposed for spurring active discussion and
interaction amongst the participants. A key aim of this session will be on
establishing areas and identifying problems of interest to the community.


Invited Speakers
----------------

Stephen Boyd (Stanford)
* Aharon Ben-Tal (Technion)
* Ben Recht (UW Madison)

Workshop Organizers
-------------------

* Suvrit Sra, Max Planck Institute for Intelligent Systems
* Sebastian Nowozin, Microsoft Research, Cambridge, UK
* Stephen Wright, University of Wisconsin, Madison

------------------------------------------------------------------------------

Show more
View full details
Workshop

New Frontiers in Model Order Selection

Yevgeny Seldin · Yacov Crammer · Nicolò Cesa-Bianchi · Francois Laviolette · John Shawe-Taylor
Dec 15, 10:30 PM - 11:00 AM Melia Sol y Nieve: Ski

Model order selection, which is a trade-off between model complexity and its empirical data fit, is one of the fundamental questions in machine learning. It was studied in detail in the context of supervised learning with i.i.d. samples, but received relatively little attention beyond this domain. The goal of our workshop is to raise attention to the question of model order selection in other domains, share ideas and approaches between the domains, and identify perspective directions for future research. Our interest covers ways of defining model complexity in different domains, examples of practical problems, where intelligent model order selection yields advantage over simplistic approaches, and new theoretical tools for analysis of model order selection. The domains of interest span over all problems that cannot be directly mapped to supervised learning with i.i.d. samples, including, but not limited to, reinforcement learning, active learning, learning with delayed, partial, or indirect feedback, and learning with submodular functions.

An example of first steps in defining complexity of models in reinforcement learning, applying trade-off between model complexity and empirical performance, and analyzing it can be found in [1-4]. An intriguing research direction coming out of these works is simultaneous analysis of exploration-exploitation and model order selection trade-offs. Such an analysis enables to design and analyze models that adapt their complexity as they continue to explore and observe new data. Potential practical applications of such models include contextual bandits (for example, in personalization of recommendations on the web [5]) and Markov decision processes.

References:
[1] N. Tishby, D. Polani. "Information Theory of Decisions and Actions", Perception-Reason-Action Cycle: Models, Algorithms and Systems, 2010.
[2] J. Asmuth, L. Li, M. L. Littman, A. Nouri, D. Wingate, "A Bayesian Sampling Approach to Exploration in Reinforcement Learning", UAI, 2009.
[3] N. Srinivas, A. Krause, S. M. Kakade, M. Seeger, "Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design", ICML, 2010.
[4] Y. Seldin, N. Cesa-Bianchi, F. Laviolette, P. Auer, J. Shawe-Taylor, J. Peters, "PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off", ICML-2011 workshop on online trading of exploration and exploitation.
[5] A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R. Schapire, "Contextual Bandit Algorithms with Supervised Learning Guarantees", AISTATS, 2011.

Show more
View full details
Workshop

Sparse Representation and Low-rank Approximation

Ameet S Talwalkar · Lester W Mackey · Mehryar Mohri · Michael W Mahoney · Francis Bach · Mike Davies · Remi Gribonval · Guillaume R Obozinski
Dec 15, 10:30 PM - 11:00 AM Montebajo: Room 1

Sparse representation and low-rank approximation are fundamental tools in fields as diverse as computer vision, computational biology, signal processing, natural language processing, and machine learning. Recent advances in sparse and low-rank modeling have led to increasingly concise descriptions of high dimensional data, together with algorithms of provable performance and bounded complexity. Our workshop aims to survey recent work on sparsity and low-rank approximation and to provide a forum for open discussion of the key questions concerning these dimensionality reduction techniques. The workshop will be divided into two segments, a "sparsity segment" emphasizing sparse dictionary learning and a "low-rank segment" emphasizing scalability and large data.

The sparsity segment will be dedicated to learning sparse latent representations and dictionaries: decomposing a signal or a vector of observations as sparse linear combinations of basis vectors, atoms or covariates is ubiquitous in machine learning and signal processing. Algorithms and theoretical analyses for obtaining these decompositions are now numerous. Learning the atoms or basis vectors directly from data has proven useful in several domains and is often seen from different view points: (a) as a matrix factorization problem with potentially some constraints such as pointwise nonnegativity, (b) as a latent variable model which can be treated in a probabilistic and potentially Bayesian way, leading in particular to topic models, and (c) as dictionary learning with often a goal of signal representation or restoration. The goal of this part of the workshop is to confront these various points of view and foster exchanges of ideas among the signal processing, statistics, machine learning and applied mathematics communities.

The low-rank segment will explore the impact of low-rank methods for large-scale machine learning. Large datasets often take the form of matrices representing either a set of real-valued features for each datapoint or pairwise similarities between datapoints. Hence, modern learning problems face the daunting task of storing and operating on matrices with millions to billions of entries. An attractive solution to this problem involves working with low-rank approximations of the original matrix. Low-rank approximation is at the core of widely used algorithms such as Principal Component Analysis and Latent Semantic Indexing, and low-rank matrices appear in a variety of applications including lossy data compression, collaborative filtering, image processing, text analysis, matrix completion, robust matrix factorization and metric learning. In this segment we aim to study new algorithms, recent theoretical advances and large-scale empirical results, and more broadly we hope to identify additional interesting scenarios for use of low-rank approximations for learning tasks.

Show more
View full details
Workshop

Integrating Language and Vision

Raymond Mooney · Trevor Darrell · Kate Saenko
Dec 15, 10:30 PM - 11:00 AM Montebajo: Library

A growing number of researchers in computer vision have started to explore how language accompanying images and video can be used to aid interpretation and retrieval, as well as train object and activity recognizers. Simultaneously, an increasing number of computational linguists have begun to investigate how visual information can be used to aid language learning and interpretation, and to ground the meaning of words and sentences in perception. However, there has been very little direct interaction between researchers in these two distinct disciplines. Consequently, researchers in each area have a quite limited understanding of the methods in the other area, and do not optimally exploit the latest ideas and techniques from both disciplines when developing systems that integrate language and vision. Therefore, we believe the time is particularly opportune for a workshop that brings together researchers in both computer vision and natural-language processing (NLP) to discuss issues and ideas in developing systems that combine language and vision.

Traditional machine learning for both computer vision and NLP requires manually annotating images, video, text, or speech with detailed labels, parse-trees, segmentations, etc. Methods that integrate language and vision hold the promise of greatly reducing such manual supervision by using naturally co-occurring text and images/video to mutually supervise each other.

There are also a wide range of important real-world applications that require integrating vision and language, including but not limited to: image and video retrieval, human-robot interaction, medical image processing, human-computer interaction in virtual worlds, and computer graphics generation.

More than any other major conference, NIPS attracts a fair number of researchers in both computer vision and computational linguistics. Therefore, we believe it is the best venue for holding a workshop that brings these two communities together for the very first time to interact, collaborate, and discuss issues and future directions in integrating language and vision.

Show more
View full details
Workshop

Machine Learning for Sustainability

Thomas Dietterich · Zico Kolter · Matthew A Brown
Dec 16, 10:30 PM - 11:00 AM Melia Sierra Nevada: Guejar

Sustainability problems pose one of the greatest challenges facing society. Humans consume more than 16TW of power, about 84% of which comes from unsustainable fossil fuels. In addition to simply being a finite resource, the carbon released from fossil fuels is a significant driver of climate change and could have a profound impact on our environment. In addition to carbon releases, humans are modifying the ecosphere in many ways that are leading to large changes in the function and structure of ecosystems. These include huge releases of nitrogen from fertilizers, the collapse and extinction of many species, and the unsustainable harvest of natural resources (e.g., fish, timber). While sustainability problems span many disciplines, several tasks in this space are fundamentally prediction, modeling, and control tasks, areas where machine learning can have a large impact. Many of these problems also require the development of novel machine learning methods, particularly methods that can scale to very large spatio-temporal problem instances.

In recent years there has been growing interest in applying machine to problems of sustainability, spanning applications in energy, environmental management, and climate modeling. The goal of this workshop will be to bring together researchers from both the machine learning and sustainability application fields to continue and build upon this emerging area. The talks and posters will span general discussions of sustainability issues, specific sustainability-related data sets and problem domains, and ongoing work on developing and applying machine learning techniques to these tasks.

Show more
View full details
Workshop

Philosophy and Machine Learning

Marcello Pelillo · Joachim M Buhmann · Tiberio Caetano · Bernhard Schölkopf · Larry Wasserman
Dec 16, 10:30 PM - 11:00 AM Melia Sierra Nevada: Hotel Bar

The fields of machine learning and pattern recognition can arguably be considered as a modern-day incarnation of an endeavor which has challenged mankind since antiquity. In fact, fundamental questions pertaining to categorization, abstraction, generalization, induction, etc., have been on the agenda of mainstream philosophy, under different names and guises, since its inception. With the advent of modern digital computers and the availablity of enormous amount of raw data, these questions have now taken a computational flavor: instead of asking, say, "What is a dog?", we have started asking "How can one recognize a dog?" or, more technically, "What is an algorithm to recognize a dog?". Indeed, it has even been maintained that for a philosophical theory of knowledge to be respectable, it has to be described in computational terms (Thagard, 1988).

As it often happens with scientific research, in the early days of machine learning and pattern recognition there used to be a genuine interest around philosophical and conceptual issues (see, e.g., Minsky, 1961; Sutherland, 1968; Watanabe, 1969; Bongard, 1970; Nelson, 1976; Good, 1983), but over time the interest shifted almost entirely to technical and algorithmic aspects, and became driven mainly by practical applications. With this reality in mind, it is instructive to remark that although the dismissal of philosophical inquiry at times of intense incremental scientific progress is understandable to allow time for the immediate needs of problem-solving, it is also sometimes responsible for preventing or delaying the emergence of true scientific progress (Kuhn, 1962).

There are several points of contact between philosophy, machine learning, and pattern recognition worth exploiting. To begin, as pointed out by Duda, Hart, and Stork (2000), the very foundations of pattern recognition can be traced back to early Greek philosophers who distinguished between an “essential property” from an “accidental property” of an object, so that the whole field of pattern recognition can naturally be cast as the problem of finding such essential properties of a category. As a matter of fact, during the past centuries several varieties of "essentialism" have been put forward, and it is not clear which one, if any, is being used by present-day pattern recognition research (see Gelman, 2003, for a developmental psychology perspective). Interestingly, in modern times, the very essentialist assumption has been vigorously challenged (see, e.g., James, 1890/1983; Wittgenstein, 1953; Rorty, 1979), giving rise to a relativistic position which denies the existence of essences, thereby suggesting a relational view which is reminiscent of modern link-oriented approaches to social network analysis (Kleinberg, 1998; Easley and Kleinberg, 2010) as well to kernel- and purely similarity-based approaches to pattern analysis and recognition (see, e.g., Schölkopf and Smola, 2001; Shawe-Taylor and Cristianini, 2004; http://simbad-fp7.eu).

Besides the representation problem alluded to above, another all-important philosophical issue related to the machine learning endeavor concerns the very process of inference, and hence its connections to the philosophy of science. In fact, there are such striking analogies between the two disciplines that it has even been maintained that machine learning should be regarded as "experimental philosophy of science" (Korb, 2004). This is motivated by the observation that at the very heart of both fields there lies the notion of an inductive strategy (by way of algorithms or as they appear in scientific practice), and that the hypothesis choice in science is akin to model selection in machine learning (but see, Williamson, 2009, for a more elaborate position). The connecton with the philosophy of science touches upon such fundamental topics as the foundations of probability (Savage, 1972), Bayesianism and causality (Spirtes, Glymour, and Scheines, 2001; Bovens and Hartmann, 2004; Pearl, 2009; Koller and Friedman, 2009), inductionism vs. falsificationism (Popper, 1959; Lakatos, 1970), etc., each of which is on the agenda of present-day machine learning research.

Other fundamental topics which lie at the intersection of philosophy, machine learning and pattern recognition (and cognitive science as well) include: the nature of similarity and categorization (e.g., Quine, 1969; Goodman, 1972; Tversky, 1977; Lakoff, 1987; Eco, 2000; Hahn and Ramscar, 2001), (causal) decision theory (Lewis, 1981; Skyrms, 1980; Joyce, 1999), game theory (Nozick, 1994; Fudenberg and Levine, 1998; Shafer and Vovk, 2001; Cesa-Bianchi and Lugosi, 2006; Shoham and Leyton-Brown, 2009; Skyrms, 2010), and the nature of information (Watanabe, 1969; Hintikka and Suppes, 1970; Adams, 2003; Skyrms, 2010; Floridi, 2011).

In recent years there has been an increasing interest around the foundational and/or philosophical problems of machine learning and pattern recognition, from both the computer scientist's and the philosopher's camps. We mention, for example, Bob Williamson's project of "reconceiving machine learning" (http://users.cecs.anu.edu.au/~williams/rml.html), the NIPS'09 workshop on "Clustering: Science or art?" (http://stanford.edu/~rezab/nips2009workshop/) and the associated manifesto (von Luxburg, Williamson, and Guyon, 2011), the recent MIT Press book by Gilbert Harman (a philosopher) and S. Kulkarni (an engineer) on reliable inductive reasoning (Harman and Kulkarni, 2007), the ECML'2001 workshop on "Machine learning as experimental philosophy of science" (http://www.csse.monash.edu.au/~korb/posml.html) with the associated special issue of Minds and Machines (vol. 14, no. 4, 2004), the work of P. Thagard on "computational philosophy of science" (Thagard, 1988, 1990), Corfield et al.'s study on the connection between the Popper and the VC-dimension (Corfield, Schölkopf, and Vapnik, 2009), von Luxburg and Schölkopf 's contribution in the Handbook of the History of Logic (von Luxburg and Schölkopf, 2011), Halpern and Pearl's philosophical study on "causes and explanations" (Halpern and Pearl, 2005), and O. Bousquet's blog on "machine learning thoughts" (http://ml.typepad.com/machinelearningthoughts/), to name a few examples.

This suggests that the time is ripe to attempt establishing a long-term dialogue between the philosophy and the machine learning communities with a view to foster cross-fertilization of ideas. In particular, we do feel the present moment is appropriate for reflection, reassessment and eventually some synthesis, with the aim of providing the machine learning field a self-portrait of where it currently stands and where it is going as a whole, and hopefully suggesting new directions. The aim of this workshop is precisely to consolidate research efforts in this area, and to provide an informal discussion forum for researchers and practitioners interested in this important yet diverse subject.

Accordingly, topics of interest include (but are not limited to):

- connections to epistemology and philosophy of science (inductionism, falsificationism, etc)
- essentialism vs anti-essentialism (e.g., feature-based vs similarity/relational approaches)
- foundations of probability and causality (Bayesianism, etc.)
- abstraction and generalization
- connections to decision and game theory
- similarity and categorization
- the nature of information


References

Adams, F. (2003). The informational turn in philosophy. Minds and Machines 13(4):471–501.

Bongard, M. M. (1970). Pattern Recognition. Spartan Books, New York (original published in Russian in 1967).

Bovens, L., and Hartmann, S. (2004). Bayesian Epistemology. Oxford University Press, Oxford, UK.

Cesa-Bianchi, N., and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press, Cambridge, UK.

Corfield, D., Schölkopf, B., and Vapnik, V. (2009). Falsificationism and statistical learning theory: Comparing the Popper and the Vapnik-Chervonenkis dimensions. J. Gen. Phil. Sci. 40:51-58.

Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern Classification. John Wiley & Sons, New York.

Easley, D., and Kleinberg, J. (2010). Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, Cambridge, UK.

Eco, U. (2000). Kant and the Platypus: Essays on Language and Cognition. Harvest Books.

Floridi, L. (2011). The Philosophy of Information. Oxford University Press, Oxford, UK.

Fudenberg, D., and Levine, D. K. (1998). The Theory of Learning in Games. MIT Press, Cambridge, MA.

Gelman, S. A. (2003). The Essential Child: Origins of Essentialism in Everyday Thought. Oxford University Press, New York.

Good, I. J. (1983). The philosophy of exploratory data analysis. Phil. Sci. 50(2):283-295.

Goodman, N. (1972). Seven strictures on similarity. In: N. Goodman (Ed.), Problems and Projects. Bobs-Merrill, Indianapolis.

Hahn, U., and Ramscar, M. (Eds.) (2001). Similarity and Categorization. Oxford University Press, Oxford, UK.

Halpern, J., and Pearl, J. (2005). Causes and explanations: A structural-model approach. British J. Phil. Sci. 56:843-911.


Harman, G., and Kulkarni, S. (2007). Reliable Reasoning: Induction and Statistical Learning Theory. MIT Press, Cambridge, MA.

Hintikka, J., and Suppes, P. (Eds.) (1970). Information and Inference. Springer, Berlin.

James, W. (1983). The Principles of Psychology. Harvard University Press, Cambridge, MA (Originally published in 1890).

Joyce, J. (1999). The Foundations of Causal Decision Theory. Cambridge University Press, Cambridge, UK.

Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms.

Koller, D., and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, UK.

Korb, K. (2004). Introduction: Machine learning as philosophy of science. Minds and Machines 14(4).

Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.

Lakatos, I. (1970). Falsification and the methodology of scientific research programmes. In Lakatos, I., and Musgrove, A. (Eds). Criticism and the Growth of Knowledge. Cambridge University Press, Cambridge.

Lakoff, G. (1987). Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. The University of Chicago Press.

Lewis, D. (1981). Causal decision theory. Australasian J. Phil. 59:5–30.

Minsky, M. (1961). Steps toward artificial intelligence. Proc. IRE 49:8-30.

Nelson, R. J. (1976). On mechanical recognition. Phil. Sci. 43(1):24-52.

Nozick, R. (1994). The Nature of Rationality. Princeton University Press, Princeton, NJ.

Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, UK (2nd edition).

Popper, K. R. (1959). The Logic of Scientific Discovery. Hutchinson & Co. (Originally published in German in 1935).

Quine, W. V. O. (1969). Natural kinds. In: Ontological Relativity and Other Essays. Columbia University Press.

Rorty, R. (1979). Philosophy and the Mirror of Nature. Princeton University Press, Princeton, NJ.

Savage, L. (1972). The Foundations of Statistics. Dover, New York (2nd edition).

Schölkopf, B., and Smola, A. (2001). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA.

Shafer, G., and Vovk, V. (2001). Probability and Finance: It's Only a Game. John WIley & Sons, New York.

Shawe-Taylor, J., and Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK.

Shoham, Y., and Leyton-Brown, K. (2009). Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, Cambridge, UK.

Skyrms, B. (1980). Causal Necessity: A Pragmatic Investigation of the Necessity of Laws. Yale University Press, New Haven, CT.

Skyrms, B. (2010). Signals: Evolution, Learning and Information. Oxford University Press, Oxford, UK.

Spirtes, P., Glymour, C., and Scheines, R. (2001). Causation, Prediction, and Search. MIT Press, Cambridge, MA.

Sutherland, N. S. (1968). Outlines of a theory of visual pattern recognition in animals and man. Proc. Royal Soc. B 171:297-317.

Thagard, P. (1988). Computational Philosophy of Science. MIT Press, Cambridge, MA.

Thagard, P. (1990). Philosophy and machine learning. Canad. J. Phil. 20(2):261-276.

Tversky, A. (1977). Features of similarity. Psychol. Rev. 84(4):327-352.

von Luxburg, U., and Schölkopf, B. (2011). Statistical Learning Theory: Models, Concepts, and Results. In: D. Gabbay, S. Hartmann and J. Woods (Eds). Handbook of the History of Logic, vol 10: Inductive Logic. pp. 651-706. Elsevier.

von Luxburg, U., Williamson, R. C., and Guyon, I. (2011). Clustering: Science or art? (http://users.cecs.anu.edu.au/~williams/papers/P185.pdf)

Watanabe, S. (1969). Knowing and Guessing: A Quantitative Study of Inference and Information. John Wiley & Sons, New York.

Williamson, J. (2009). The philosophy of science and its relation to machine learning. In: M. M. Gaber (Ed.), Scientific Data Mining and Knowledge Discovery: Principles and Foundations. Springer, Berlin.

Wittgenstein, L. (1953). Philosophical Investigations. Blackwell Publishers.

Show more
View full details
Workshop

2nd Workshop on Computational Social Science and the Wisdom of Crowds

Winter Mason · Jennifer Wortman Vaughan · Hanna Wallach
Dec 16, 10:30 PM - 11:00 AM Telecabina: Movie Theater

Computational social science is an emerging academic research area at the intersection of computer science, statistics, and the social sciences, in which quantitative methods and computational tools are used to identify and answer social science questions. The field is driven by new sources of data from the Internet, sensor networks, government databases, crowdsourcing systems, and more, as well as by recent advances in computational modeling, machine learning, statistics, and social network analysis.

The related area of social computing deals with the mechanisms through which people interact with computational systems, examining how and why people contribute to crowdsourcing sites, and the Internet more generally. Examples of social computing systems include prediction markets, reputation systems, and collaborative filtering systems, all designed with the intent of capturing the wisdom of crowds.

Machine learning plays in important role in both of these research areas, but to make truly groundbreaking advances, collaboration is necessary: social scientists and economists are uniquely positioned to identify the most pertinent and vital questions and problems, as well as to provide insight into data generation, while computer scientists contribute significant expertise in developing novel, quantitative methods and tools.

The inaugural workshop brought together experts from fields as diverse as political science, psychology, economics, and machine learning, connecting researchers with common goals but disparate methods and audiences. The quality of work presented was excellent and we expect the same caliber of submissions again this year. As with last year's workshop, we hope to attract a mix of established members of the NIPS community and researchers who have never attended NIPS and will provide an entirely new perspective.

The primary goals of the workshop are to provide an opportunity for attendees to meet, interact, share ideas, establish new collaborations, and to inform the wider NIPS community about current research in computational social science and social computing. To this end, the workshop will consist of invited talks, contributed talks, a poster session, a panel session, and a dinner.

We intend for the workshop to be broad enough to cover a wide variety of problems and computational techniques. Consequently, we plan to include research on theoretical models, empirical work, and everything in between, including but not limited to:

* Automatic aggregation of opinions or knowledge

* Incentives in social computation (e.g., game-theoretic approaches)

* Prediction markets / information markets

* Studies of events and trends (e.g., in politics)

* Quality control for user generated content

* Analysis of and experiments on distributed collaboration and consensus-building, including crowdsourcing (e.g., Mechanical Turk) and peer-production systems (e.g., Wikipedia and Yahoo! Answers)

* Group dynamics and decision-making

* Modeling network interaction content (e.g., text analysis of blog posts, tweets, emails, chats, etc.)

* Social networks

* Games with a purpose

The workshop will address the following specific goals:

* Identify and formalize open research areas.

* Propose, explore, and discuss new questions and problems.

* Discuss how best to facilitate the transfer of research ideas between the computer and social sciences.

* Direct future work and create new application areas, novel modeling approaches, and unexplored collaborative research directions.

The workshop will be announced via email and relevant mailing lists (including the ML-NEWS, UAI, COLT, PASCAL, and topic modeling lists). We will also ask the members of our interdisciplinary program committee (currently being formed) to spread the word in their own research communities. We will construct a workshop website, containing information for prospective participants and pointers to relevant research within the computer and social sciences. Accepted submissions will be made publicly available on the website.

Show more
View full details
Workshop

Domain Adaptation Workshop: Theory and Application

John Blitzer · Corinna Cortes · Afshin Rostamizadeh
Dec 16, 10:30 PM - 11:00 AM Melia Sierra Nevada: Monachil

A common assumption in theoretical models of learning such as the standard PAC model [20], as well as in the design of learning algorithms, is that training instances are drawn according to the same distribution as the unseen test examples. In practice, however, there are many cases where this assumption does not hold. There can be no hope for generalization, of course, when the training and test distributions vastly differ, but when they are less dissimilar, learning can be more successful. The main theme of this workshop is the theoretical, algorithmic, and empirical analysis of such cases where there is a mismatch between the training and test distributions. This includes the crucial scenario of domain adaptation where the training examples are drawn from a source domain distinct from the target domain from which the test examples are extracted, or the more general scenario of multiple source adaptation where training instances may have been collected from multiple source domains, all distinct from the target [13]. The topic of our workshop also covers other important problems such that of sample bias correction and has tight connections with other problems such as active learning where the active distribution corresponding to the learner's labeling request differs from the target distribution. Many other intermediate problems and scenarios appear in practice, which will be all covered by this workshop. These problems are all critical and appear in almost all real-world applications of machine learning. Ignoring them can lead to dramatically poor results. Some straightforward existing solutions based on importance weighting are not always successful [5]. Which algorithms should be used for domain adaptation? Under what theoretical conditions will they be successful? How do these algorithms scale to large domain adaptation problems? These are some of the questions that the workshop aims to address. The problem of domain adaptation and other related ones already mentioned are crucial in practice. They arise in a variety of applications in natural language processing [7, 2, 10, 4, 6], speech processing [11, 8, 17, 19, 9, 18], computer vision [15], and many other areas.

The empirical performance of domain adaptation in these applications, the design of new and effective algorithms, as well as the creation of a solid theoretical framework for domain adaptation as initiated by recent work [1, 13, 12, 14, 5] are all challenging objectives for this workshop. By bringing together current experts in all aspects of this problem, we aim to foster collaborations and successful progress in this field.

Goals:
Despite the recent advances in domain adaptation, many of the most successful practical achievements in domain adaptation [3, 16, 21] have not been robust, in part because they lack formal assumptions about when they could perform well. At the same time, some of the most influential theoretical work guarantees near optimal performance in new domains, but under assumptions that may not hold in practice [1, 12, 13].

Our workshop will bridge theory and practice in the following ways:

1.We will have one applied and two theoretical invited talks.

2.We will advertise the workshop to both the applied and theoretical communities.

3.We will have discussion sessions whose aim emphasizes both the formal assumptions underlying successful practical algorithms and new algorithms based on theoretical foundations.

Workshop attendees should come away with an understanding of the domain adaptation problem, how it appears in practical applications and existing theoretical guarantees that can be provided in this more general setting. More importantly, attendees will be exposed to the important open problems of the field, which will encourage new collaborations and results.


References:

[1] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. Proceedings of NIPS 2006, 2007.

[2] J. Blitzer, M. Dredze, and F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In ACL 2007, 2007.

[3] J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, 2006.

[4] C. Chelba and A. Acero. Adaptation of maximum entropy capitalizer: Little data can help a lot. Computer Speech & Language, 20(4):382-399, 2006.

[5] C. Cortes, Y. Mansour, and M. Mohri. Learning bounds for importance weighting. In Advances in Neural Information Processing Systems (NIPS 2010), Vancouver, Canada, 2010. MIT Press.

[6] H. Daum'e III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101-126, 2006.

[7] M. Dredze, J. Blitzer, P. P. Talukdar, K. Ganchev, J. Graca, and F. Pereira. Frustratingly Hard Domain Adaptation for Parsing. In CoNLL 2007, 2007.

[8] J.-L. Gauvain and Chin-Hui. Maximum a posteriori estimation for multi- variate gaussian mixture observations of markov chains. IEEE Transactions on Speech and Audio Processing, 2(2):291-298, 1994.

[9] F. Jelinek. Statistical Methods for Speech Recognition. The MIT Press, 1998.

[10] J. Jiang and C. Zhai. Instance Weighting for Domain Adaptation in NLP. In Proceedings of ACL 2007, pages 264-271. Association for Computational Linguistics, 2007.

[11] C. J. Legetter and P. C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, pages 171-185, 1995.

[12] Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation: Learning bounds and algorithms. Conference on Learning Theory, 2009.

[13] Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation with multiple sources. In Advances in Neural Information Processing Systems (NIPS 2008), pages 1041-1048, Vancouver, Canada, 2009. MIT Press.

[14] Y. Mansour, M. Mohri, and A. Rostamizadeh. Multiple source adaptation and the R'enyi divergence. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montr'eal, Canada, June 2009.

[15] A. M. Mart'inez. Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans. Pattern Anal. Mach. Intell., 24(6):748-763, 2002.

[16] D. McClosky, E. Charniak, and M. Johnson. Reranking and self-training for parser adaptation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 337-344. Association for Computational Linguistics, 2006.

[17] S. D. Pietra, V. D. Pietra, R. L. Mercer, and S. Roukos. Adaptive language modeling using minimum discriminant estimation. In HLT '91: Proceedings of the workshop on Speech and Natural Language, pages 103-106, 1992.

[18] B. Roark and M. Bacchiani. Supervised and unsupervised PCFG adaptation to novel domains. In Proceedings of HLT-NAACL, 2003.

[19] R. Rosenfeld. A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer Speech and Language, 10:187-228, 1996.

[20] L. G. Valiant. A theory of the learnable. ACM Press New York, NY, USA, 1984.

[21] G. Xue, W. Dai, Q. Yang, and Y. Yu. Topic-bridged plsa for cross-domain text classication. In SIGIR, 2008.

Show more
View full details
Workshop

Challenges in Learning Hierarchical Models: Transfer Learning and Optimization

Quoc V. Le · Marc'Aurelio Ranzato · Russ Salakhutdinov · Josh Tenenbaum · Andrew Y Ng
Dec 16, 10:30 PM - 11:00 AM Montebajo: Library

The ability to learn abstract representations that support transfer to novel but related tasks lies at the core of solving many AI related tasks, including visual object recognition, information retrieval, speech perception, and language understanding. Hierarchical models that support inferences at multiple levels have been developed and argued as among the most promising candidates for achieving such goal. An important property of these models is that they can extract complex statistical dependencies from high-dimensional sensory input and efficiently learn latent variables by re-using and combining intermediate concepts, allowing these models to generalize well across a wide variety of tasks.

In the past few years, researchers across many different communities, from applied statistics to engineering, computer science and neuroscience, have proposed several hierarchical models that are capable of extracting useful, high-level structured representations. The learned representations have been shown to give promising results for solving a multitude of novel learning tasks. A few notable examples of such models include Deep Belief Networks, Deep Boltzmann Machines, sparse coding-based methods, nonparametric and parametric hierarchical Bayesian models.

Despite recent successes, many existing hierarchical models are still far from being able to represent, identify and learn the wide variety of possible patterns and structure in real-world data. Existing models can not cope with new tasks for which they have not been specifically trained. Even when applied to related tasks, trained systems often display unstable behavior. Furthermore, massive volumes of training data (e.g., data transferred between tasks) and high-dimensional input spaces poses challenging questions on how to effectively train the deep hierarchical models. The recent availability of large scale datasets (like ImageNet for visual object recognition or Wall Street Journal for large vocabulary speech recognition), the continuous advances in optimization methods, and the availability of cluster computing have drastically changed the working scenario, calling for a re-assessment of the strengths and weaknesses of many existing optimization strategies.

The aim of this workshop is to bring together researchers working on such hierarchical models to discuss two important challenges: the ability to perform transfer learning and the best strategies to optimize these systems on large scale problems. These problems are "large" in terms of input dimensionality (in the order of millions), number of training samples (in the order of 100 millions or more) and number of categories (in the order of several tens of thousands). During the course of the workshop, we shall be interested in discussing the following topics:

1. State of the field: What are the existing methods and what is the relationship between them? Which problems can be solved using existing learning algorithms and which require fundamentally different approaches? How are current methods optimized? Which models can scale to very high-dimensional inputs, to datasets with large number of categories and with huge number of training samples? Which models best leverage large amounts of unlabeled data?

2. Learning structured representations: How can machines extract invariant representations from a large supply of high-dimensional highly-structured unlabeled data? How can these representations be used to represent and learn tens of thousands of different concepts (e.g., visual object categories) and expand on them without disrupting previously-learning concepts? How can these representations be used in multiple applications?

3. Transfer learning: How can previously-learned representations help learning new tasks so that less labeled supervision is needed? How can this facilitate knowledge representation for transfer learning tasks?

4. One-shot learning: For many traditional machine classification algorithms, learning curves are measured in tens, hundreds or thousands of training examples. For humans learners, however, just a few training examples is often sufficient to grasp a new concept. Can we develop models that are capable of efficiently leveraging previously-learned background knowledge in order to learn novel categories based on a single training example? Are there models suitable for generalizing across domains, when presented with one or few examples?

5. Scalability and success in real-world applications: How well do existing transfer learning models scale to large-scale problems including problems in computer vision, natural language processing, and speech perception? How well do these algorithms perform when applied to modeling high-dimensional real-world distributions?

6. Optimization: Which optimization methods are best for training a deep deterministic network? Which stochastic optimization algorithms are best for training a probabilistic generative models? Which optimization strategies are best to train on several thousands of categories?

7. Parallel computing: which optimization algorithm is best on GPU's. and which benefit the most by parallel computing on a cloud?

8. Theoretical Foundations: What are the theoretical guarantees of learning hierarchical models? Under what conditions is it possible to provide performance guarantees for such algorithms?

9. Suitable tasks and datasets: What are the right datasets and tasks that could be used in future research on the topic and to facilitate comparisons between methods?

In order to facilitate the discussion and to standardize results, we will invite participants to test their methods on the following two challenges.

- Transfer Learning Challenge: we will make available a dataset that has a large amount of unlabeled data and a large number of categories. The task is to categorize samples belonging to a novel category that has only few labeled training samples available. Participants will have to follow a strict training/test protocol to make results comparable. Performance is measured in terms of accuracy as well as training and test time.

- Optimization Challenge: the aim is to test several optimization algorithms to train a non-linear predictor on three large scale datasets (to perform a visual recognition task, a speech recognition task and a text categorization task). A strict protocol will be enforced to make results comparable and performance will be evaluated in terms of accuracy as well as training time both on single core machine as well as GPU and cluster.

More details: https://sites.google.com/site/nips2011workshop/

Show more
View full details
Workshop

The 4th International Workshop on Music and Machine Learning: Learning from Musical Structure

Rafael Ramirez · Darrell Conklin · Douglas Eck · Rif A. Saurous
Dec 16, 10:30 PM - 11:00 AM Melia Sierra Nevada: Dilar

Motivation
With the current explosion and quick expansion of music in digital formats, and the computational power of modern systems, research on machine learning and music is gaining increasing popularity. As complexity of the problems investigated by researchers on machine learning and music increases, there is a need to develop new algorithms and methods to solve these problems. The focus of this workshop is on novel methods which take into account or benefit from musical structure. MML 2011 aims to build on the previous three successful MML editions, MML’08, MML’09 and MML’10.

Topic
It has been convincingly shown that many useful applications can be built using features derived from short musical snippets (chroma, MFCCs and related timbral features, augmented with tempo and beat representations). Given the great advances in these applications, higher level aspects of musical structure such as melody, harmony, phrasing and rhythm can now be given further attention, and we especially welcome contributions exploring these areas. The MML 2011 workshop intends to concentrate on machine learning algorithms employing higher level features and representations for content-based music processing.


Papers in all applications on music and machine learning are welcome, including but not limited to automatic classification of music (audio and MIDI), style-based interpreter recognition, automatic composition and improvisation, music recommender systems, genre and tag prediction, score alignment, polyphonic pitch detection, chord extraction, pattern discovery, beat tracking, and expressive performance modeling. Audio demonstrations are encouraged when indicated by the content of the paper.

Expected Attendees
The expected attendees are active researchers in machine learning and music who have special interest in content-based music processing. We believe that this is a timely workshop because there is an increasing interest in music processing using machine learning techniques in both the ML and music communities, and that the time is ripe to start extracting, modeling and making use of higher-level features of music.

Agenda
The workshop is planned to last one full day, and will feature paper and poster presentations, panel discussions and open discussions (see proposed schedule below).
The accepted contributions will be available from the workshop web page as soon as possible in order to encourage active discussion during the workshop. At the end of each paper session there will be time allocated for discussion. Each discussion will initially be focused on the research reported by the session contributions, and then generalized to the session general topic. At the end of the workshop there will be a dedicated session to discuss about the perspectives and future directions of content-based music processing.


Call for Papers
The Call for Papers can be found at:
https://sites.google.com/site/musicmachinelearning11/

Show more
View full details
Workshop

Choice Models and Preference Learning

Jean-Marc Andreoli · Cedric Archambeau · Guillaume Bouchard · Shengbo Guo · Kristian Kersting · Scott Sanner · Martin Szummer · Paolo Viappiani · Onno Zoeter
Dec 16, 10:30 PM - 11:00 AM Montebajo: Room 1

Preference learning has been studied for several decades and has drawn increasing attention in recent years due to its importance in diverse applications such as web search, ad serving, information retrieval, recommender systems, electronic commerce, and many others. In all of these applications, we observe (often discrete) choices that reflect preferences among several entities, such as documents, webpages, products, songs etc. Since the observation then is partial, or censored, the goal is to learn the complete preference model, e.g. to reconstruct a general ordering function from observed preferences in pairs.

Traditionally, preference learning has been studied independently in several research areas, such as machine learning, data and web mining, artificial intelligence, recommendation systems, and psychology among others, with a high diversity of application domains such as social networks, information retrieval, web search, medicine, biology, etc. However, contributions developed in one application domain can, and should, impact other domains. One goal of this workshop is to foster this type of interdisciplinary exchange, by encouraging abstraction of the underlying problem (and solution) characteristics during presentation and discussion. In particular, the workshop is motivated by the two following lines of research:

1. Large scale preference learning with sparse data: There has been a great interest and take-up of machine learning techniques for preference learning in learning to rank, information retrieval and recommender systems, as supported by the large proportion of preference learning based literature in the widely regarded conferences such as SIGIR, WSDM, WWW, CIKM. Different paradigms of machine learning have been further developed and applied to these challenging problems, particularly when there is a large number of users and items but only a small set of user preferences are provided.

2. Personalization in social networks: recent wide acceptance of social networks has brought great opportunities for services in different domains, thanks to Facebook, Linkin, Douban, Twitter, etc. It is important for these service providers to offer personalized service (e.g., personalization of Twitter recommendations). Social information can improve the inference for user preferences. However, it is still challenging to infer user preferences based on social relationship.

As such, we especially encourage submissions on theory, methods, and applications focusing on large-scale preference learning in social media. In order to avoid a dispersed research workshop, we solicit submissions (papers, demos and project descriptions) and participation that specifically tackle the research areas as below:

Preference elicitation
Ranking aggregation
Discrete choice models and inference
Statistical relational learning for preferences
Link prediction for preferences
Learning Structured Preferences
Multi-task preference learning


Important Dates:

Paper submission deadline: 3 November 2011 (Extended)
Author notification: 5 November 2011
Final paper due: 1 December 2011
Workshop date: 17 December 2011


Submission Instructions:

We solicit extended abstracts using the NIPS style files, preferably 2 to 4 pages, but no more than 8 pages. Submissions should include the title, authors' names, and email addresses. We will post the final version of the papers on the workshop web page and encourage authors to post their contribution on arXiv.

Papers should be submitted to the EasyChair system at https://www.easychair.org/conferences/?conf=cmpl2011.

We are seeking funds to publish the talks on http://videolectures.net/.

Show more
View full details
Workshop

Machine Learning in Computational Biology

Jean-Philippe Vert · Gunnar Rätsch · Yanjun Qi · Tomer Hertz · Anna Goldenberg · Christina Leslie
Dec 16, 10:30 PM - 11:00 AM Melia Sierra Nevada: Genil

The field of computational biology has seen dramatic growth over the past few years, both in terms of new available data, new scientific questions, and new challenges for learning and inference. In particular, biological data are often relationally structured and highly diverse, well-suited to approaches that combine multiple weak evidence from heterogeneous sources. These data may include sequenced genomes of a variety of organisms, gene expression data from multiple technologies, protein expression data, protein sequence and 3D structural data, protein interactions, gene ontology and pathway databases, genetic variation data (such as SNPs), and an enormous amount of textual data in the biological and medical literature. New types of scientific and clinical problems require the development of novel supervised and unsupervised learning methods that can use these growing resources. Furthermore, next generation sequencing technologies are yielding terabyte scale data sets that require novel algorithmic solutions.

The goal of this workshop is to present emerging problems and machine learning techniques in computational biology. We invited several speakers from the biology/bioinformatics community who will present current research problems in bioinformatics, and we will invite contributed talks on novel learning approaches in computational biology. We encourage contributions describing either progress on new bioinformatics problems or work on established problems using methods that are substantially different from standard approaches. Kernel methods, graphical models, feature selection, and other techniques applied to relevant bioinformatics problems would all be appropriate for the workshop. The targeted audience are people with interest in learning and applications to relevant problems from the life sciences.

Computational biology currently attracts great interest in the NIPS community, but there is still no yearly forum for advances in machine learning for computational biology within existing conferences in the two fields. Over the past few years, we have been working to establish this workshop as a recurring annual meeting in order to provide such a forum. In addition to having continuity among the organizers, we have enlisted a distinguished program committee to ensure that diverse work of the best quality is represented at the workshop. Typically, at least one invited speaker has been a prominent molecular biologist, with the goal of introducing the audience to emerging problems, technologies, and data sources from a biological viewpoint. We have previously organized BMC Bioinformatics special issues with work presented at the workshop, to increase the visibility of learning methods in computational biology. We have also attracted funding from the EU PASCAL2 network to support invited speakers and video recording of the talks for publication on http://videolectures.net.

Show more
View full details
Workshop

Bayesian Nonparametric Methods: Hope or Hype?

Emily Fox · Ryan Adams
Dec 16, 10:30 PM - 11:00 AM Melia Sierra Nevada:Dauro

Assessing the State of Bayesian Nonparametric Machine Learning

Bayesian nonparametric methods are an expanding part of the machine learning landscape. Proponents of Bayesian nonparametrics claim that these methods enable one to construct models that can scale their complexity with data, while representing uncertainty in both the parameters and the structure. Detractors point out that the characteristics of the models are often not well understood and that inference can be unwieldy. Relative to the statistics community, machine learning prac- titioners of Bayesian nonparametrics frequently do not leverage the representation of uncertainty that is inherent in the Bayesian framework. Neither do they perform the kind of analysis — both empirical and theoretical — to set skeptics at ease. In this workshop we hope to bring a wide group together to constructively discuss and address these goals and shortcomings.

Please see the following website for further information:
http://people.seas.harvard.edu/~rpa/nips2011npbayes.html

Show more
View full details
Workshop

Learning Semantics

Antoine Bordes · Jason Weston · Ronan Collobert · Leon Bottou
Dec 16, 10:30 PM - 11:00 AM Melia Sol y Nieve: Ski

A key ambition of AI is to render computers able to evolve in and interact with the real world. This can be made possible only if the machine is able to produce a correct interpretation of its available modalities (image, audio, text, etc.), upon which it would then build a reasoning to take appropriate actions. Computational linguists use the term semantics'' to refer to the possible interpretations (concepts) of natural language expressions, and showed some interest inlearning semantics'', that is finding (in an automated way) these interpretations. However, ``semantics'' are not restricted to natural language modality, and are also pertinent for speech or vision modalities. Hence, knowing visual concepts and common relationships between them would certainly bring a leap forward in scene analysis and in image parsing akin to the improvement that language phrase interpretations would bring to data mining, information extraction or automatic translation, to name a few.

Progress in learning semantics has been slow mainly because this involves sophisticated models which are hard to train, especially since they seem to require large quantities of precisely annotated training data. However, recent advances in learning with weak and limited supervision lead to the emergence of a new body of research in semantics based on multi-task/transfer learning, on learning with semi/ambiguous supervision or even with no supervision at all. The goal of this workshop is to explore these new directions and, in particular, to investigate the following questions:
\begin{itemize}
\item How should meaning representations be structured to be easily interpretable by a computer and still express rich and complex knowledge?
\item What is a realistic supervision setting for learning semantics? How can we learn sophisticated representations with limited supervision?
\item How can we jointly infer semantics from several modalities?

This workshop defines the issue of learning semantics as its main interdisciplinary subject and aims at identifying, establishing and discussing potential, challenges and issues of learning semantics. The workshop is mainly organized around invited speakers to highlight several key current directions, but, it also presents selected contributions and is intended to encourage the exchange of ideas with all the other members of the NIPS community.

Show more
View full details
Workshop

Machine Learning meets Computational Photography

Michael Hirsch · Stefan Harmeling · Rob Fergus · Peyman Milanfar
Dec 16, 10:30 PM - 11:00 AM Melia Sol y Nieve: Snow

In recent years, computational photography (CP) has emerged as a new field that has put forward a new understanding and thinking of how to image and display our environment. Besides addressing classical imaging problems such as deblurring or denoising by exploiting new insights and methodology in machine learning as well as computer and human vision, CP goes way beyond traditional image processing and photography.

By developing new imaging systems through innovative hardware design, CP not only aims at improving existing imaging techniques but also aims at the development of new ways of perceiving and capturing our surroundings. However, CP is not only about to redefine "everyday" photography but also aims at applications in scientific imaging, such as microscopy, biomedical imaging, and astronomical imaging, and can thus be expected to have a significant impact in many research areas.

After the great success of last year's workshop on CP at NIPS, this workshop proposal tries to accommodate the strong interest in a follow-up workshop expressed by many workshop participants last year. The objectives of this workshop are: (i) to give an introduction to CP, present current approaches and report about the latest developments in this fast-progressing field, (ii) spot and discuss current limitations and present open problems of CP to the NIPS community, and (iii) to encourage scientific exchange and foster interaction between researchers from machine learning, neuro science and CP to advance the state of the art in CP.

The tight interplay between both hardware and software renders CP an exciting field of research for the whole NIPS community, which could contribute in various ways to its advancement, be it by enabling new imaging devices that are possible due to the latest machine learning methods or by new camera and processing designs that are inspired by our neurological understanding of natural visual systems.

Thus the target group of participants are researchers from the whole NIPS community (machine learning and neuro science) and researchers working on CP and related fields.

Show more
View full details
Workshop

Discrete Optimization in Machine Learning (DISCML): Uncertainty, Generalization and Feedback

Andreas Krause · Pradeep Ravikumar · Stefanie S Jegelka · Jeffrey A Bilmes
Dec 16, 10:30 PM - 11:00 AM Melia Sol y Nieve: Slalom

Solving optimization problems with ultimately discrete solutions is becoming increasingly important in machine learning: At the core of statistical machine learning is to infer conclusions from data, and when the variables underlying the data are discrete, both the tasks of inferring the model from data, as well as performing predictions using the estimated model are discrete optimization problems. Many of the resulting optimization problems are NP-hard, and typically, as the problem size increases, standard off-the-shelf optimization procedures become intractable.

Fortunately, most discrete optimization problems that arise in machine learning have specific structure, which can be leveraged in order to develop tractable exact or approximate optimization procedures. For example, consider the case of a discrete graphical model over a set of random variables. For the task of prediction, a key structural object is the "marginal polytope," a convex bounded set characterized by the underlying graph of the graphical model. Properties of this polytope, as well as its approximations, have been successfully used to develop efficient algorithms for inference. For the task of model selection, a key structural object is the discrete graph itself. Another problem structure is sparsity: While estimating a high-dimensional model for regression from a limited amount of data is typically an ill-posed problem, it becomes solvable if it is known that many of the coefficients are zero. Another problem structure, submodularity, a discrete analog of convexity, has been shown to arise in many machine learning problems, including structure learning of probabilistic models, variable selection and clustering. One of the primary goals of this workshop is to investigate how to leverage such structures.

The focus of this year’s workshop is on the interplay between discrete optimization and machine learning: How can we solve inference problems arising in machine learning using discrete optimization? How can one solve discrete optimization problems that themselves are learned from training data? How can we solve challenging sequential and adaptive discrete optimization problems where we have the opportunity to incorporate feedback (online and active learning with combinatorial decision spaces)? We will also explore applications of such approaches in computer vision, NLP, information retrieval etc.

Show more
View full details