Program Highlights »
Sat Dec 10th 08:00 AM -- 06:30 PM @ Room 114
Towards an Artificial Intelligence for Data Science
Charles Sutton · James Geddes · Zoubin Ghahramani · Padhraic Smyth · Chris Williams

Workshop Home Page

Machine learning methods have applied beyond their origins in artificial intelligence to a wide variety of data analysis problems in fields such as science, health care, technology, and commerce. Previous research in machine learning, perhaps motivated by its roots in AI, has primarily aimed at fully-automated approaches for prediction problems. But predictive analytics is only one step in the larger pipeline of data science, which includes data wrangling, data cleaning, exploratory visualization, data integration, model criticism and revision, and presentation of results to domain experts.

An emerging strand of work aims to address all of these challenges in one stroke is by automating a greater portion of the full data science pipeline. This workshop will bring together experts in machine learning, data mining, databases and statistics to discuss the challenges that arise in the full end-to-end process of collecting data, analysing data, and making decisions and building new methods that support, whether in an automated or semi-automated way, more of the full process of analysing real data.

Considering the full process of data science raises interesting questions for discussion, such as: What aspects of data analysis might potentially be automated and what aspects seem more difficult? Statistical model building often emphasizes interpretability and human understanding, while machine learning often emphasizes predictive modeling --- are ML methods truly suitable for supporting the full data analysis pipeline? Do recent advances in ML offer help here? Finally, are there low hanging fruit, i.e., how much time is wasted on routine tasks in scientific data analysis that could be automated?

Specific topics of interest include: data cleaning, exploratory data analysis, semi-supervised learning, active learning, interactive machine learning, model criticism, automated and semi-automated model construction, usable machine learning, interpretable prediction methods and automatic methods to explain predictions. We are especially interested in contributions that take a broader perspective, i.e., that aim toward supporting the process of data science more holistically.

09:10 AM Automated Data Cleaning via Multi-View Anomaly Detection (Talk)
Tom Dietterich
09:50 AM Automatic Discovery of the Statistical Types of Variables in a Dataset (Talk)
Isabel Valera, Zoubin Ghahramani
10:10 AM Poster spotlights (Talk)
11:00 AM Invited talk, Christian Steinruecken (Talk)
Christian Steinruecken
11:40 AM Probabilistic structure discovery in time series data (Talk)
Dave Janz, Brooks Paige, Tom Rainforth, Jan-Willem van de Meent
12:00 PM Poster session
02:00 PM Invited talk, Carlos Guestrin (Talk)
Carlos Guestrin
02:40 PM An Overview of the DARPA Data Driven Discovery of Models (D3M) Program (Talk)
Richard Lippmann, William Campbell
03:30 PM Invited talk, Frank Hutter (Talk)
Frank Hutter
04:10 PM Data Analytics as Data: A Semantic Workflow Approach (Talk)
Kristin P Bennett
04:30 PM General-Purpose Inductive Programming for Data Wrangling Automation (Talk)