[ Westin: Emerald A ]
The WWW has reached the stage where it can be looked upon as a gigantic information copying and distribution mechanism. But when the problem of distributing and copying information is essentially solved, where do we go next? There are a number of values that can be derived from the mesh, that also have immediate relevancy for the ML community. Goal of the workshop is to link these areas, and encourage cross-boundary thinking and working. Topics will be: (1) Machine learning and probabilistic modeling: Recommendation systems and knowledge extraction are two immediate applications, with research required for large scale inference, modeling languages, and efficient decision making. (2) Game theory and mechanism design: When a large number of contributors is involved, how can tasks and incentive structures be made such that the desired goal is achieved? Research is required for solving very large games, and for mechanism design under uncertainty. (3) Knowledge representation and reasoning: Large parts of the web are currently stored in an unstructured way, making linking and evaluating knowledge a complex problem. Open points are the difficulty of reasoning, the tradeoff between efficiency of reasoning and power of the representation, and reasoning under uncertainty. (4) Social networks and collective …
[ Hilton: Sutcliffe A ]
This workshop aims at collecting theoretical insights in the design of data-dependent learning strategies. Specifically we are interested in how far learned prediction rules may be characterized in terms of the observations themselves. This amounts to capturing how well data can be used to construct structured hypothesis spaces for risk minimization strategies - termed empirical hypothesis spaces. Classical analysis of learning algorithms requires the user to define a proper hypothesis space before seeing the data. In practice however, one often decides on the proper learning strategy or the form of the prediction rules of interest after inspection of the data. This theoretical gap constitutes exactly the scope of this workshop.
[ Westin: Alpine E ]
Cost-sensitive learning aims to minimize the data acquisition cost while maximizing the accuracy of the learner/predictor. Many sub-fields in machine learning such as semi-supervised learning, active label/feature acquisition, cascaded classification, and inductive transfer are motivated by the need to minimize the cost of data acquisition in various application domains. These approaches typically attempt to minimize data acquisition costs under strong simplifying assumptions -- e.g., features vectors are assumed to have zero cost in semi-supervised learning. Although all of these areas have felt the need for a principled solution to minimize data costs, until recently the acquisition cost has rarely been modeled directly. Despite some recent work in this area, much more research is needed on this important topic. It is also important to ensure that the theoretical work addresses the practical needs of several application communities such as computer aided medical diagnosis, signal processing, remote sensing, computer vision, etc. We hope to bring together researchers from semi-supervised learning, active label/feature acquisition, inductive transfer learning, cascaded classification and other theoretical areas with practitioners from various application domains. We welcome both novel theory/algorithms and contributions that draw attention to open problems and challenges in real-world applications which call for cost-sensitive learning.
[ Hilton: Sutcliffe B ]
Structured data emerges rapidly in a large number of disciplines: bioinformatics, systems biology, social network analysis, natural language processing and the Internet generate large collections of strings, graphs, trees, and time series. Designing and analysing algorithms for dealing with these large collections of structured data has turned into a major focus of machine learning over recent years, both in the input and output domain of machine learning algorithms, and is starting to enable exciting new applications of machine learning. The goal of this workshop is to bring together experts on learning with structured input and structured output domains and its applications, in order to exchange the latest developments in these growing fields. The workshop will include one session on learning with structured inputs, featuring a keynote by Prof. Eric Xing from Carnegie Mellon University. A second session will focus on learning with structured outputs, with a keynote by Dr. Yasemin Altun from the MPI for Biological Cybernetics. A third session will present novel applications of structured input-structured output learning to real-world problems.
[ Hilton: Diamond Head ]
Classical optimization techniques have found widespread use in machine learning. Convex optimization has occupied the center-stage and significant effort continues to be still devoted to it. New problems constantly emerge in machine learning, e.g., structured learning and semi-supervised learning, while at the same time fundamental problems such as clustering and classification continue to be better understood. Moreover, machine learning is now very important for real-world problems with massive datasets, streaming inputs, the need for distributed computation, and complex models. These challenging characteristics of modern problems and datasets indicate that we must go beyond the ""traditional optimization"" approaches common in machine learning. What is needed is optimization ""tuned"" for machine learning tasks. For example, techniques such as non-convex optimization (for semi-supervised learning, sparsity constraints), combinatorial optimization and relaxations (structured learning), stochastic optimization (massive datasets), decomposition techniques (parallel and distributed computation), and online learning (streaming inputs) are relevant in this setting. These techniques naturally draw inspiration from other fields, such as operations research, polyhedral combinatorics, theoretical computer science, and the optimization community.
[ Westin: Alpine CD ]
We believe that the wide-spread adoption of open source software policies will have a tremendous impact on the field of machine learning. The goal of this workshop is to further support the current developments in this area and give new impulses to it. Following the success of the inaugural NIPS-MLOSS workshop held at NIPS 2006, the Journal of Machine Learning Research (JMLR) has started a new track for machine learning open source software initiated by the workshop's organizers. Many prominent machine learning researchers have co-authored a position paper advocating the need for open source software in machine learning. Furthermore, the workshop's organizers have set up a community website mloss.org where people can register their software projects, rate existing projects and initiate discussions about projects and related topics. This website currently lists 132 such projects including many prominent projects in the area of machine learning. The main goal of this workshop is to bring the main practitioners in the area of machine learning open source software together in order to initiate processes which will help to further improve the development of this area. In particular, we have to move beyond a mere collection of more or less unrelated software projects and …
[ Westin: Nordic ]
Machine learning has traditionally been focused on prediction. Given observations that have been generated by an unknown stochastic dependency, the goal is to infer a law that will be able to correctly predict future observations generated by the same dependency. Statistics, in contrast, has traditionally focused on data modeling, i.e., on the estimation of a probability law that has generated the data. During recent years, the boundaries between the two disciplines have become blurred and both communities have adopted methods from the other, however, it is probably fair to say that neither of them has yet fully embraced the field of causal modeling, i.e., the detection of causal structure underlying the data. Since the Eighties there has been a community of researchers, mostly from statistics and philosophy, who have developed methods aiming at inferring causal relationships from observational data. While this community has remained relatively small, it has recently been complemented by a number of researchers from machine learning. The goal of this workshop is to discuss new approaches to causal discovery from empirical data, their applications and methods to evaluate their success. Emphasis will be put on the definition of objectives to be reached and assessment methods to evaluate …
[ Westin: Callaghan ]
There has recently been a surge of interest in algebraic methods in machine learning. In no particular order, this includes: new approaches to ranking problems; the budding field of algebraic statistics; and various applications of non-commutative Fourier transforms. The aim of the workshop is to bring together these distinct communities, explore connections, and showcase algebraic methods to the machine learning community at large. AML'08 is intended to be accessible to researchers with no prior exposure to abstract algebra. The program includes three short tutorials that will cover the basic concepts necessary for understanding cutting edge research in the field.
[ Hilton: Mt. Currie S ]
The field of computational biology has seen dramatic growth over the past few years, both in terms of new available data, new scientific questions, and new challenges for learning and inference. In particular, biological data is often relationally structured and highly diverse, well-suited to approaches that combine multiple weak evidence from heterogeneous sources. These data may include sequenced genomes of a variety of organisms, gene expression data from multiple technologies, protein expression data, protein sequence and 3D structural data, protein interactions, gene ontology and pathway databases, genetic variation data (such as SNPs), and an enormous amount of textual data in the biological and medical literature. New types of scientific and clinical problems require the development of novel supervised and unsupervised learning methods that can use these growing resources. The goal of this workshop is to present emerging problems and machine learning techniques in computational biology. We invited several speakers from the biology/bioinformatics community who will present current research problems in bioinformatics, and we invite contributed talks on novel learning approaches in computational biology. We encourage contributions describing either progress on new bioinformatics problems or work on established problems using methods that are substantially different from standard approaches. Kernel methods, graphical …
[ Hilton: Black Tusk ]
Can statistical machine learning theories and algorithms help explain human learning? Broadly speaking, machine learning studies the fundamental laws that govern all learning processes, including both artificial systems (e.g., computers) and natural systems (e.g., humans). It has long been understood that theories and algorithms from machine learning are relevant to understanding aspects of human learning. Human cognition also carries potential lessons for machine learning research, since people still learn languages, concepts, and causal relationships from far less data than any automated system. There is a rich opportunity to develop a general theory of learning which covers both machines and humans, with the potential to deepen our understanding of human cognition and to take insights from human learning to improve machine learning systems. The goal of this workshop is to bring together the different communities that study machine learning, cognitive science, neuroscience and educational science. We will investigate the value of advanced machine learning theories and algorithms as computational models for certain human learning behaviors, including, but not limited to, the role of prior knowledge, learning from labeled and unlabeled data, learning from active queries, and so on. We also wish to explore the insights from the cognitive study of human …
[ Hilton: Mt. Currie N ]
It is well known that sensory and motor information is represented in the activity of large populations of neurons. Encoding and decoding this information is a matter of active debate. One way to study population coding is to analyze response dependencies. Rate and temporal coding are opposing theories of neural coding. Dependency concepts for these theories are rate covariance and temporal spike coordination. In the typical theoretical framework, response dependencies are characterized by correlation coefficients and cross-correlograms. The main goal of this workshop is to challenge the dependency concepts that are typically applied and to disseminate more sophisticated concepts to a wider public. It will bring together experts from different fields and encourage exchange of insights between experimentalists and theoreticians.
[ Westin: Alpine AB ]
Recently, statistics and machine learning have seen the proliferation of both theoretical and computational tools for analyzing graphs to support progress in applied domains such as social sciences, biology, medicine, neuroscience, physics, finance, and economics. This workshop actively promotes a concerted effort to address statistical, methodological and computational issues that arise when modeling and analyzing large collection of data which are primarily represented as static and/or dynamic graphs. Presentations include (but are not limited to) novel graph models, the application of established models to new domains, theoretical and computational issues, limitations of current graph methods and directions for future research. The workshop aims to bring together researchers from applied disciplines such as sociology, economics, medicine and biology with researchers from mathematics, physics, statistics and computer science.
[ Hilton: Cheakamus ]
This workshop is intended for researchers interested in machine learning methods for speech and language processing and in unifying approaches to several outstanding speech and language processing issues. In the last few years, significant progress has been made in both research and commercial applications of speech and language processing. Despite the superior empirical results, however, there remain important theoretical issues to be addressed. Theoretical advancement is expected to drive greater system performance improvement, which in turn generates the new need of in-depth studies of emerging novel learning and modeling methodologies. The main goal of the proposed workshop is to fill in the above need, with the main focus on the fundamental issues of new emerging approaches and empirical applications in speech and language processing. Another focus of this workshop is on the unification of learning approaches to speech and language processing problems. Many problems in speech processing and in language processing share a wide range of similarities (despite conspicuous differences), and techniques in speech and language processing fields can be successfully cross-fertilized. It is of great interest to study unifying modeling and learning approaches across these two fields. In summary, we hope that this workshop will present an opportunity for …
[ Westin: Glacier ]
A significant emphasis in trying to achieve adaptation and learning in the perception-action cycle of agents lies in the development of suitable algorithms. While partly these algorithms result from mathematical constructions, in modern research much attention is given to methods that mimic biological processes. However, mimicking the apparent features of what appears to be a biologically relevant mechanism makes it difficult to separate the essentials of adaptation and learning from accidents of evolution. This is a challenge both for the understanding of biological systems as well as for the design of artificial ones. Therefore, recent work is increasingly concentrating on identifying general principles rather than individual mechanisms for biologically relevant information processing. One advantage is that a small selection of principles can give rise to a variety of - effectively equivalent - mechanisms. The ultimate goal is to attain a more transparent and unified view on the phenomena in question. Possible candidates for such principles governing the dynamics of the perception-action cycle include but are not limited to information theory, Bayesian models, energy-based concepts or group-theoretical principles. The workshops aims at bringing together various principle-based directions for the investigation of various aspects of the perception-action cycle and at identifying promising …
[ Hilton: Sutcliffe B ]
Statistical learning methods have become mainstream in the analysis of Functional Magnetic Resonance Imaging (fMRI) data, spurred on by a growing consensus that meaningful neuro-scientific models built from fMRI data should be capable of accurate predictions of behavior or neural functioning. These approaches have convinced most neuroscientists that there is tremendous potential in the decoding of brain states using statistical learning. Along with this realization, though, has come a growing recognition of the limitations inherent in using black-box prediction methods for drawing neuro-scientific interpretations. The primary challenge now is how best to exploit statistical learning to answer scientific questions by incorporating domain knowledge and embodying hypotheses about cognitive processes into our models. Further advances will require resolution of many open questions, including: 1) Variability/Robustness: to what extent do patterns in fMRI replicate across trials, subjects, tasks, and studies? To what extent are processes that are observable through the fMRI BOLD response truly replicable across these different conditions? How similar is the neural functioning of one subject to another? 2) Representation: the most common data representation continues to consider voxels as static and independent, and examples are i.i.d.; however, activation patterns almost surely do not lie in voxel space. What are …
[ Westin: Alpine E ]
While the machine learning community has primarily focused on analysing the output of a single data source, there has been relatively few attempts to develop a general framework, or heuristics, for analysing several data sources in terms of a shared dependency structure. Learning from multiple data sources (or alternatively, the data fusion problem) is a timely research area. Due to the increasing availability and sophistication of data recording techniques and advances in data analysis algorithms, there exists many scenarios in which it is necessary to model multiple, related data sources, i.e. in fields such as bioinformatics, multi-modal signal processing, information retrieval, sensor networks etc. The open question is to find approaches to analyse data which consists of more than one set of observations (or view) of the same phenomenon. In general, existing methods use a discriminative approach, where a set of features for each data set is found in order to explicitly optimise some dependency criterion. However, a discriminative approach may result in an ad hoc algorithm, require regularisation to ensure erroneous shared features are not discovered, and it is difficult to incorporate prior knowledge about the shared information. A possible solution is to overcome these problems is a generative …
[ Hilton: Sutcliffe A ]
Interest in parallel hardware concepts, including multicore, specialized hardware, and multimachine, has recently increased as researchers have looked to scale up their concepts to large, complex models and large datasets. In this workshop, a panel of invited speakers will present results of investigations into hardware concepts for accelerating a number of different learning and simulation algorithms. Additional contributions will be presented in poster spotlights and a poster session at the end of the one-day workshop. Our intent is to provide a broad survey of the space of hardware approaches in order to capture the current state of activity in this venerable domain of study. Approaches to be covered include silicon, FPGA, and supercomputer architectures, for applications such as Bayesian network models of large and complex domains, simulations of cortex and other brain structures, and large-scale probabilistic algorithms.
[ Westin: Emerald A ]
The WWW has reached the stage where it can be looked upon as a gigantic information copying and distribution mechanism. But when the problem of distributing and copying information is essentially solved, where do we go next? There are a number of values that can be derived from the mesh, that also have immediate relevancy for the ML community. Goal of the workshop is to link these areas, and encourage cross-boundary thinking and working. Topics will be: * Machine learning and probabilistic modeling: Recommendation systems and knowledge extraction are two immediate applications, with research required for large scale inference, modeling languages, and efficient decision making. * Game theory and mechanism design: When a large number of contributors is involved, how can tasks and incentive structures be made such that the desired goal is achieved? Research is required for solving very large games, and for mechanism design under uncertainty. * Knowledge representation and reasoning: Large parts of the web are currently stored in an unstructured way, making linking and evaluating knowledge a complex problem. Open points are the difficulty of reasoning, the tradeoff between efficiency of reasoning and power of the representation, and reasoning under uncertainty. * Social networks and collective …
[ Westin: Callaghan ]
Reinforcement Learning (RL) problems are typically formulated in terms of Stochastic Decision Processes (SDPs), or a specialization thereof, Markovian Decision Processes (MDPs), with the goal of identifying an optimal control policy. In contrast to planning problems, RL problems are characterized by the lack of complete information concerning the transition and reward models of the SDP. Hence, algorithms for solving RL problems need to estimate properties of the system from finite data. Naturally, any such estimated quantity has inherent uncertainty. One of the interesting and challenging aspects of RL is that the algorithms have partial control over the data sample they observe, allowing them to actively control the amount of this uncertainty, and potentially trade it off against performance. Reinforcement Learning as a field of research, has over the past few years seen renewed interest in methods that explicitly consider the uncertainties inherent to the learning process. Indeed, interest in data-driven models that take uncertainties into account, goes beyond RL to the fields of Control Theory, Operations Research and Statistics. Within the RL community, relevant lines of research include Bayesian RL, risk sensitive and robust dynamic decision making, RL with confidence intervals and applications of risk-aware and uncertainty-aware decision-making. The goal …
[ Hilton: Black Tusk ]
The ultimate performance measure of an animals brain, is whether it can produce adequate behaviour to increase its species fitness. A basic characteristic of biological systems is the variability of behaviour. Variability can be observed across many levels of biological organisation: from movement in humans, the responses of cellular networks to repeated identical stimulations, to the interaction of bio molecules. Variability has, therefore, emerged as a key ingredient in understanding computational and biological mechanisms in the brain (Faisal et al, 2008, Nature Rev Neurosci). Advances in experimental methods have increased the availability, amount and quality of behavioural data for both humans and animals. Yet most behavioural studies lack adequate quantitative methods to model behaviour and its variability in a natural manner. These approaches make use of simple experiments with straightforward interpretation and subjectively defined behavioural performance indicators often averaging out meaningful variability. Thus, a major challenge in analyzing behavior is to discover some underlying simplicity in a complex stream of behavioral actions. The gain of such an analysis is that the underlying simplicity is often a reflection of the mechanism driving behavior.
[ Westin: Alpine CD ]
Probabilistic graphical models provide a formal lingua franca for modeling and a common target for efficient inference algorithms. Their introduction gave rise to an extensive body of work in machine learning, statistics, robotics, vision, biology, neuroscience, AI and cognitive science. However, many of the most innovative and exciting probabilistic models published by the NIPS community far outstrip the representational capacity of graphical models and are instead communicated using a mix of natural language, pseudo code, and mathematical formulae and solved using special purpose, one-off inference methods. Very often, graphical models are used only to describe the coarse, high-level structure rather than the precise specification necessary for automated inference. Probabilistic programming languages aim to close this representational gap; literally, users specify a probabilistic model in its entirety (e.g., by writing code that generates a sample from the joint distribution) and inference follows automatically given the specification. Several existing systems already satisfy this specification to varying degrees of expressiveness, compositionality, universality, and efficiency. We believe that the probabilistic programming language approach, which has been emerging over the last 10 years from a range of diverse fields including machine learning, computational statistics, systems biology, probabilistic AI, mathematical logic, theoretical computer science and programming …
[ Hilton: Cheakamus ]
Natural language processing (NLP) models must deal with the complexstructure and ambiguity present in human languages. Because labeleddata is unavailable for many domains, languages, and tasks, supervisedlearning approaches only partially address these challenges. Incontrast, unlabeled data is cheap and plentiful, making unsupervisedapproaches appealing. Moreover, in recent years, we have seen excitingprogress in unsupervised learning for many NLP tasks, includingunsupervised word segmentation, part-of-speech and grammar induction,discourse analysis, coreference resolution, document summarization,and topic induction. The goal of this workshop is to bring together researchers from theunsupervised machine learning community and the natural languageprocessing community to facilitate cross-fertilization of techniques,models, and applications. The workshop focus is on the unsupervisedlearning of latent representations for natural language and speech. Inparticular, we are interested in structured prediction models whichare able to discover linguistically sophisticated patterns from rawdata. To provide a common ground for comparison and discussion, wewill provide a cleaned and preprocessed data set for the convenienceof those who would like to participate. This data will containpart-of-speech tags and parse trees in addition to raw sentences. Anexciting direction in unsupervised NLP is the use of parallel text inmultiple languages to provide additional structure on unsupervisedlearning. To that end, we will provide a bilingual corpus with wordalignments, …
[ Westin: Nordic ]
Kernel methods are widely used to address a variety of learning tasks including classification, regression, ranking, clustering, and dimensionality reduction. The appropriate choice of a kernel is often left to the user. But, poor selections may lead to sub-optimal performance. Furthermore, searching for an appropriate kernel manually may be a time-consuming and imperfect art. Instead, the kernel selection process can be included as part of the overall learning problem. In this way, better performance guarantees can be given and the kernel selection process can be made automatic. In this workshop, we will be concerned with using sampled data to select or learn a kernel function or kernel matrix appropriate for the specific task at hand. We will discuss several scenarios, including classification, regression, and ranking, where the use of kernels is ubiquitous, and different settings including inductive, transductive, or semi-supervised learning. We also invite discussions on the closely related fields of features selection and extraction, and are interested in exploring further the connection with these topics. The goal is to cover all questions related to the problem of learning kernels: different problem formulations, the computational efficiency and accuracy of the algorithms that address these problems and their different strengths and …
[ Hilton: Diamond Head ]
There are around 100,000 neurons under a mm^2 of cerebral cortex and about one billion synapses. Thalamic inputs to the cortex carry information that is transformed by local microcircuits of excitatory and inhibitory neurons. In recent years there has been an explosion of discoveries about the anatomical organization of the micrcircuits and the physiolgical properties of the neurons and synapses that compose them. The goal of this workshop is to explore the functional implications of these new findings and in particular to attempt to characterize the elementary computational operations that are performed in different layers of cortex.
[ Hilton: Mt. Currie N ]
It is well known that sensory and motor information is represented in the activity of large populations of neurons. Encoding and decoding this information is a matter of active debate. One way to study population coding is to analyze response dependencies. Rate and temporal coding are opposing theories of neural coding. Dependency concepts for these theories are rate covariance and temporal spike coordination. In the typical theoretical framework, response dependencies are characterized by correlation coefficients and cross-correlograms. The main goal of this workshop is to challenge the dependency concepts that are typically applied and to disseminate more sophisticated concepts to a wider public. It will bring together experts from different fields and encourage exchange of insights between experimentalists and theoreticians.
[ Westin: Alpine AB ]
Graphical models have become a key tool in representing multi-variatedistributions in many machine learning applications. They have beensuccessfully used in diverse fields such as machine-vision,bioinformatics, natural language processing, reinforcement learningand many others. Approximate inference in such models has attracted a great deal ofinterest in the learning community, and many algorithms have beenintroduced in recent years, with a specific emphasis on inference indiscrete variable models. These new methods explore new and excitinglinks between inference, combinatorial optimization, and convexduality. They provide new avenues for designing and understandingmessage passing algorithms, and can give theoretical guarantees whenused for learning graphical models. The goal of this workshop is to assess the current state of the fieldand explore new directions. We shall specifically be interested inunderstanding the following issues: 1. State of the field: What are the existing methods, and how do theyrelate to each other? Which problems can be solved using existingalgorithms, and which cannot? 2. Quality of approximations: What are the theoretical guaranteesregarding the output of the approximate inference algorithms (e.g.,upper or lower bounds on MAP or marginals, optimality within a givenfactor, certificates of optimality etc.). How do these depend on thecomplexity of the inference algorithms (i.e., what is the tradeoffbetween running time and …