Data science is a field of evidence-seeking that combines data with domain information to generate new knowledge. It addresses key considerations in AI regarding when and where data-driven solutions are reliable and appropriate. Such considerations require involvement from humans who collectively understand the domain and tools used to collect, process, and model data. Throughout the data science life cycle, these humans make judgment calls to extract information from data. Veridical data science seeks to ensure that this information is reliable, reproducible, and clearly communicated so that empirical evidence may be evaluated in the context of human decisions. Three core principles: predictability, computability, and stability (PCS) provide the foundation for veridical data science. In this talk we will present a unified PCS framework for data analysis, consisting of both a workflow and documentation, illustrated through iterative random forests and case studies from genomics and precision medicine.
Bin Yu (UC Berkeley)
Bin Yu is Chancellor’s Professor in the Departments of Statistics and of Electrical Engineering & Computer Sciences at the University of California at Berkeley and a former chair of Statistics at UC Berkeley. Her research focuses on practice, algorithm, and theory of statistical machine learning and causal inference. Her group is engaged in interdisciplinary research with scientists from genomics, neuroscience, and precision medicine. In order to augment empirical evidence for decision-making, they are investigating methods/algorithms (and associated statistical inference problems) such as dictionary learning, non-negative matrix factorization (NMF), EM and deep learning (CNNs and LSTMs), and heterogeneous effect estimation in randomized experiments (X-learner). Their recent algorithms include staNMF for unsupervised learning, iterative Random Forests (iRF) and signed iRF (s-iRF) for discovering predictive and stable high-order interactions in supervised learning, contextual decomposition (CD) and aggregated contextual decomposition (ACD) for phrase or patch importance extraction from an LSTM or a CNN. She is a member of the U.S. National Academy of Sciences and Fellow of the American Academy of Arts and Sciences. She was a Guggenheim Fellow in 2006, and the Tukey Memorial Lecturer of the Bernoulli Society in 2012. She was President of IMS (Institute of Mathematical Statistics) in 2013-2014 and the Rietz Lecturer of IMS in 2016. She received the E. L. Scott Award from COPSS (Committee of Presidents of Statistical Societies) in 2018. Moreover, Yu was a founding co-director of the Microsoft Research Asia (MSR) Lab at Peking Univeristy and is a member of the scientific advisory board at the UK Alan Turning Institute for data science and AI.
More from the Same Authors
2021 : Data Opportunities: unsolved medical problems and where new data can help »
Bin Yu · Regina Barzilay · Marzyeh Ghassemi · Emma Pierson
2021 Poster: Adaptive wavelet distillation from neural networks through interpretations »
Wooseok Ha · Chandan Singh · Francois Lanusse · Srigokul Upadhyayula · Bin Yu
2019 Poster: A Debiased MDI Feature Importance Measure for Random Forests »
Xiao Li · Yu Wang · Sumanta Basu · Karl Kumbier · Bin Yu
2017 : Deep nets meet real neurons: pattern selectivity of V4 through transfer learning and stability analysis »
2017 : Invited Talk »