Skip to yearly menu bar Skip to main content


( events)   Timezone:  
Workshop
Fri Dec 09 11:00 PM -- 09:30 AM (PST) @ Room 114
Towards an Artificial Intelligence for Data Science
Charles Sutton · James Geddes · Zoubin Ghahramani · Padhraic Smyth · Chris Williams





Workshop Home Page

Machine learning methods have applied beyond their origins in artificial intelligence to a wide variety of data analysis problems in fields such as science, health care, technology, and commerce. Previous research in machine learning, perhaps motivated by its roots in AI, has primarily aimed at fully-automated approaches for prediction problems. But predictive analytics is only one step in the larger pipeline of data science, which includes data wrangling, data cleaning, exploratory visualization, data integration, model criticism and revision, and presentation of results to domain experts.


An emerging strand of work aims to address all of these challenges in one stroke is by automating a greater portion of the full data science pipeline. This workshop will bring together experts in machine learning, data mining, databases and statistics to discuss the challenges that arise in the full end-to-end process of collecting data, analysing data, and making decisions and building new methods that support, whether in an automated or semi-automated way, more of the full process of analysing real data.


Considering the full process of data science raises interesting questions for discussion, such as: What aspects of data analysis might potentially be automated and what aspects seem more difficult? Statistical model building often emphasizes interpretability and human understanding, while machine learning often emphasizes predictive modeling --- are ML methods truly suitable for supporting the full data analysis pipeline? Do recent advances in ML offer help here? Finally, are there low hanging fruit, i.e., how much time is wasted on routine tasks in scientific data analysis that could be automated?

Specific topics of interest include: data cleaning, exploratory data analysis, semi-supervised learning, active learning, interactive machine learning, model criticism, automated and semi-automated model construction, usable machine learning, interpretable prediction methods and automatic methods to explain predictions. We are especially interested in contributions that take a broader perspective, i.e., that aim toward supporting the process of data science more holistically.

Automated Data Cleaning via Multi-View Anomaly Detection (Talk)
Automatic Discovery of the Statistical Types of Variables in a Dataset (Talk)
Poster spotlights (Talk)
Invited talk, Christian Steinruecken (Talk)
Probabilistic structure discovery in time series data (Talk)
Poster session
Invited talk, Carlos Guestrin (Talk)
An Overview of the DARPA Data Driven Discovery of Models (D3M) Program (Talk)
Invited talk, Frank Hutter (Talk)
Data Analytics as Data: A Semantic Workflow Approach (Talk)
General-Purpose Inductive Programming for Data Wrangling Automation (Talk)