Timezone: »
Abstract:
Advances in medical information technology have resulted in enormous warehouses of data that are at once overwhelming and sparse. A single patient visit may result in tens to thousands of measurements and structured information, including clinical factors, diagnostic imaging, lab tests, genomic and proteomic tests. Hospitals may see thousands of patients each year. However, each patient may have relatively few visits to any particular medical provider. The resulting data are a heterogeneous amalgam of patient demographics, vital signs, diagnoses, records of treatment and medication receipt and annotations made by nurses or doctors, each with its own idiosyncrasies.
The objective of this workshop is to discuss how advanced machine learning techniques can derive clinical and scientific impact from these messy, incomplete, and partial data. We will bring together machine learning researchers and experts in medical informatics who are involved in the development of algorithms or intelligent systems designed to improve quality of healthcare. Relevant areas include health monitoring systems, clinical data labelling and clustering, clinical outcome prediction, efficient and scalable processing of medical records, feature selection or dimensionality reduction in clinical data, tools for personalized medicine, time-series analysis with medical applications and clinical genomics.
Detailed Description:
An important issue in clinical applications is the peculiarity of the available data – an amalgam of patient demographics, collected vital signs, diagnostics, records of administered treatment and medication and, potentially, annotations made by nurses or doctors. Vital signs are available typically as moving averages over varying time horizons [1], and occasionally in their original form (sampled at high frequency). The extensive data collection usually results in an overall abundance of data, which might lead to the falsely optimistic conclusion that its sheer magnitude will make training of any system trivial. The insidious issue with clinical data, which not even the best put-together repositories [2] manage to overcome, is its lack of continuity/consistency. The data comes from a vast number of patients, each with very specific clinical conditions. Data on individual patients may however be quite sparse and/or incomplete and often contain significant gaps due to circumstance or equipment malfunction. Not only are the samples limited for a given patient, but the health status of a single person can vary due to difference in external factors such as medication. These circumstances make short work of typical assumptions made by learning techniques. Thus, IID samples, coming from the same distribution, satisfying some tidy noise condition are virtually impossible to encounter in longitudinal physiologic data, on which medical diagnoses and decisions are based. To further complicate matters, records can be missing or outright incorrect, adding to the inevitable noise in vital sign readings, diagnostics and treatment records. Moreover, a patient can be attributed several diagnostics given out of a list of thousands of ICD9 codes. All things considered, the so-called ‘big data’ present in clinical applications is surprisingly sparse if the entire feature space is taken into account.
Despite the existence of algorithms that address some of these problems, a number of important research topics still remain open, including but not limited to:
(i) What individual-level predictions can be made from such partial, incomplete data?
(ii) How can partial, incomplete time series from multiple patients be combined to create a population and sub-population levels of understanding about treatment and disease? What are the best ways to stratify or cluster the data - using patient demographics, diagnostics and/or treatment - to ensure a plausible trade-off between model specialization and sample sufficiency? What is the best way to deal with outliers and how to detect incorrect data?
(iii) How can machine learning methods circumvent some of the inherent problems of large-scale clinical data? Can machine learning techniques and clinical tools (e.g. clinical review, expert ontologies, inter-institutional data) be used to adapt to the sparsity and biases in the data?
(iv) How can these data be used to assess standards of care and investigate the efficacy of various treatment programs? Generally, how can these data be used to help us better understand many of the complex causal relationships in medicine?
(v) Training classification models requires accurate labeling, and this in turn requires considerable effort on the part of human experts – can we reduce the amount of labeling needed through active learning? Can we use yet unlabeled data and combine semi-supervised approach with active learning to obtain higher accuracy?
(vi) What are the robust ways of modeling cross-signal correlations? How can we incorporate diagnostics and sparse, high-dimensional treatment information in clinical models? Can we characterize the effect of treatment on vital signs?
As just one example application where such research questions would be highly relevant, consider a vital sign monitoring system. Monitoring patient status is a crucial aspect of health care, with the task of anticipating and preventing critical episodes being traditionally entrusted to the nursing staff. However, there is increasing interest and demand for automated tools used to detect any abrupt worsening of health status in critically ill patients [3,4]. Most of the initial efforts were focused towards processing only one signal, a notable example being the detection of arrhythmias from electrocardiograms. However, it became increasingly clear that considering correlations across signals and deriving features over varying time windows holds great promise for the prognosis of adverse events [5,6]. Additionally, with the emergence of personalized care [7] and wearable technology for health monitoring [8], there is an increasing need for real-time online processing of vital signs and for adaptive models suitable to the ever-changing parameters specific to these applications. Given the heterogeneous data available, how can we develop flexible models that can gradually adapt to the characteristics of a patient as more data is obtained? Can this update be efficiently performed?
The workshop will approach the identified challenges from two perspectives. On one hand, healthcare experts will describe their requirements and describe the main issues in processing medical data. On the other hand, machine learning researchers will present algorithms and tools they have developed for clinical applications, describing their relevance. Most importantly, the discussions are meant to establish the limitations of current approaches, the feasibility of extending them to deal with the aforementioned data issues and to brainstorm on promising ML techniques that have been insufficiently exploited for these tasks.
References:
[1] Lawhern V., Hairston W.D., Robbins K., "Optimal Feature Selection for Artifact Classification in EEG Time Series", Foundations of Augmented Cognition Lecture Notes in Computer Science Volume 8027, 2013, pp 326-334
[2] MIMIC + other repositories
[3] Gopalakrishnan V., Lustgarten J., Visweswaran S., and Cooper G, “Bayesian rule learning for biomedical data mining”. Journal of Bioinformatics, 26, 2010.
[4] Randall Moorman J., Delos J. B., Flower A., Cao H., Kovatchev B.P., Richman J. S., and Lake D.E., “Cardiovascular oscillations at the bedside: early diagnosis of neonatal sepsis using heart rate characteristics monitoring”. Physiol Meas, 32 (11):1821-32, Nov 2011.
[5] Fiterau M., Dubrawski A., and Ye C., “Real-time adaptive monitoring of vital signs for clinical alarm preemption”, In Proceedings of the 2010 International Society for Disease Surveillance Annual Conference, 2011.
[6] Seely A.J. E., “Complexity at the bedside”, Journal of Critical Care”, Jun 2011.
[7] Narimatsu H., Kitanaka C., Kubota I., Sato S., Ueno Y., Kato T., Fukao A., Yamashita H,
Kayama T., “New developments in medical education for the realization of next-generation personalized medicine: concept and design of a medical education and training program through the genomic cohort study”, Journal of Human Genetics 2013 June 2013
[8] Pantelopoulos, A.,Bourbakis, N.G., "A Survey on Wearable Sensor-Based Systems for Health Monitoring and Prognosis," Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on , vol.40, no.1, pp.1,12, Jan. 2010
Author Information
Gunnar Rätsch (ETH Zürich)
Madalina Fiterau (UMass Amherst)
Madalina Fiterau is an Assistant Professor at the College of College of Information and Computer Sciences at UMass Amherst, with a focus on AI/ML. Previously, she was a Postdoctoral Fellow in the Computer Science Department at Stanford University, working with Professors Chris Ré and Scott Delp in the Mobilize Center. Madalina has obtained a PhD in Machine Learning from Carnegie Mellon University in September 2015, advised by Professor Artur Dubrawski. The focus of her PhD thesis, entitled “Discovering Compact and Informative Structures through Data Partitioning”, is learning interpretable ensembles, with applicability ranging from image classification to a clinical alert prediction system. Madalina is currently expanding her research on interpretable models, in part by applying deep learning to obtain salient representations from biomedical “deep” data, including time series, text and images. Madalina is the recipient of the GE Foundation Scholar Leader Award for Central and Eastern Europe. She is the recipient of the Marr Prize for Best Paper at ICCV 2015 and of Star Research Award at the Annual Congress of the Society of Critical Care Medicine 2016. She has organized two editions of the Machine Learning for Clinical Data Analysis Workshop at NIPS, in 2013 and 2014.
Julia Vogt (Memorial Sloan Kettering Cancer Center)
More from the Same Authors
-
2021 : HiRID-ICU-Benchmark --- A Comprehensive Machine Learning Benchmark on High-resolution ICU Data »
Hugo Yèche · Rita Kuznetsova · Marc Zimmermann · Matthias Hüser · Xinrui Lyu · Martin Faltys · Gunnar Rätsch -
2021 : Learning Single-Cell Perturbation Responses using Neural Optimal Transport »
Charlotte Bunne · Stefan Stark · Gabriele Gut · Andreas Krause · Gunnar Rätsch · Lucas Pelkmans · Kjong Lehmann -
2022 : On the Importance of Clinical Notes in Multi-modal Learning for EHR Data »
Severin Husmann · Hugo Yèche · Gunnar Rätsch · Rita Kuznetsova -
2022 Workshop: Learning from Time Series for Health »
Sana Tonekaboni · Thomas Hartvigsen · Satya Narayan Shukla · Gunnar Rätsch · Marzyeh Ghassemi · Anna Goldenberg -
2022 Poster: Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations »
Alexander Immer · Tycho van der Ouderaa · Gunnar Rätsch · Vincent Fortuin · Mark van der Wilk -
2019 : Poster session »
Sebastian Farquhar · Erik Daxberger · Andreas Look · Matt Benatan · Ruiyi Zhang · Marton Havasi · Fredrik Gustafsson · James A Brofos · Nabeel Seedat · Micha Livne · Ivan Ustyuzhaninov · Adam Cobb · Felix D McGregor · Patrick McClure · Tim R. Davidson · Gaurush Hiranandani · Sanjeev Arora · Masha Itkina · Didrik Nielsen · William Harvey · Matias Valdenegro-Toro · Stefano Peluchetti · Riccardo Moriconi · Tianyu Cui · Vaclav Smidl · Taylan Cemgil · Jack Fitzsimons · He Zhao · · mariana vargas vieyra · Apratim Bhattacharyya · Rahul Sharma · Geoffroy Dubourg-Felonneau · Jonathan Warrell · Slava Voloshynovskiy · Mihaela Rosca · Jiaming Song · Andrew Ross · Homa Fashandi · Ruiqi Gao · Hooshmand Shokri Razaghi · Joshua Chang · Zhenzhong Xiao · Vanessa Boehm · Giorgio Giannone · Ranganath Krishnan · Joe Davison · Arsenii Ashukha · Jeremiah Liu · Sicong (Sheldon) Huang · Evgenii Nikishin · Sunho Park · Nilesh Ahuja · Mahesh Subedar · · Artyom Gadetsky · Jhosimar Arias Figueroa · Tim G. J. Rudner · Waseem Aslam · Adrián Csiszárik · John Moberg · Ali Hebbal · Kathrin Grosse · Pekka Marttinen · Bang An · Hlynur Jónsson · Samuel Kessler · Abhishek Kumar · Mikhail Figurnov · Omesh Tickoo · Steindor Saemundsson · Ari Heljakka · Dániel Varga · Niklas Heim · Simone Rossi · Max Laves · Waseem Gharbieh · Nicholas Roberts · Luis Armando Pérez Rey · Matthew Willetts · Prithvijit Chakrabarty · Sumedh Ghaisas · Carl Shneider · Wray Buntine · Kamil Adamczewski · Xavier Gitiaux · Suwen Lin · Hao Fu · Gunnar Rätsch · Aidan Gomez · Erik Bodin · Dinh Phung · Lennart Svensson · Juliano Tusi Amaral Laganá Pinto · Milad Alizadeh · Jianzhun Du · Kevin Murphy · Beatrix Benkő · Shashaank Vattikuti · Jonathan Gordon · Christopher Kanan · Sontje Ihler · Darin Graham · Michael Teng · Louis Kirsch · Tomas Pevny · Taras Holotyak -
2019 Workshop: Machine Learning for Health (ML4H): What makes machine learning in medicine different? »
Andrew Beam · Tristan Naumann · Brett Beaulieu-Jones · Irene Y Chen · Madalina Fiterau · Samuel Finlayson · Emily Alsentzer · Adrian Dalca · Matthew McDermott -
2018 Workshop: Machine Learning for Health (ML4H): Moving beyond supervised learning in healthcare »
Andrew Beam · Tristan Naumann · Marzyeh Ghassemi · Matthew McDermott · Madalina Fiterau · Irene Y Chen · Brett Beaulieu-Jones · Michael Hughes · Farah Shamout · Corey Chivers · Jaz Kandola · Alexandre Yahi · Samuel Finlayson · Bruno Jedynak · Peter Schulam · Natalia Antropova · Jason Fries · Adrian Dalca · Irene Chen -
2018 Poster: Boosting Black Box Variational Inference »
Francesco Locatello · Gideon Dresdner · Rajiv Khanna · Isabel Valera · Gunnar Ratsch -
2018 Spotlight: Boosting Black Box Variational Inference »
Francesco Locatello · Gideon Dresdner · Rajiv Khanna · Isabel Valera · Gunnar Ratsch -
2017 Workshop: Machine Learning for Health (ML4H) - What Parts of Healthcare are Ripe for Disruption by Machine Learning Right Now? »
Jason Fries · Alex Wiltschko · Andrew Beam · Isaac S Kohane · Jasper Snoek · Peter Schulam · Madalina Fiterau · David Kale · Rajesh Ranganath · Bruno Jedynak · Michael Hughes · Tristan Naumann · Natalia Antropova · Adrian Dalca · SHUBHI ASTHANA · Prateek Tandon · Jaz Kandola · Uri Shalit · Marzyeh Ghassemi · Tim Althoff · Alexander Ratner · Jumana Dakka -
2017 Poster: Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees »
Francesco Locatello · Michael Tschannen · Gunnar Ratsch · Martin Jaggi -
2016 Workshop: Machine Learning for Health »
Uri Shalit · Marzyeh Ghassemi · Jason Fries · Rajesh Ranganath · Theofanis Karaletsos · David Kale · Peter Schulam · Madalina Fiterau -
2015 Demonstration: An interactive system for the extraction of meaningful visualizations from high-dimensional data »
Madalina Fiterau · Artur Dubrawski · Donghan Wang -
2014 Workshop: Second Workshop on Transfer and Multi-Task Learning: Theory meets Practice »
Urun Dogan · Tatiana Tommasi · Yoshua Bengio · Francesco Orabona · Marius Kloft · Andres Munoz · Gunnar Rätsch · Hal Daumé III · Mehryar Mohri · Xuezhi Wang · Daniel Hernández-lobato · Song Liu · Thomas Unterthiner · Pascal Germain · Vinay P Namboodiri · Michael Goetz · Christopher Berlind · Sigurd Spieckermann · Marta Soare · Yujia Li · Vitaly Kuznetsov · Wenzhao Lian · Daniele Calandriello · Emilie Morvant -
2013 Workshop: Machine Learning for Clinical Data Analysis and Healthcare »
Jenna Wiens · Finale P Doshi-Velez · Can Ye · Madalina Fiterau · Shipeng Yu · Le Lu · Balaji R Krishnapuram -
2012 Session: Oral Session 4 »
Gunnar Rätsch -
2012 Poster: Projection Retrieval for Classification »
Madalina Fiterau · Artur Dubrawski -
2011 Workshop: Machine Learning in Computational Biology »
Jean-Philippe Vert · Gunnar Rätsch · Yanjun Qi · Tomer Hertz · Anna Goldenberg · Christina Leslie -
2011 Poster: Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation »
Nico Goernitz · Christian Widmer · Georg Zeller · Andre Kahles · Soeren Sonnenburg · Gunnar Rätsch -
2010 Workshop: Machine Learning in Computational Biology »
Gunnar Rätsch · Jean-Philippe Vert · Tomer Hertz · Yanjun Qi -
2008 Workshop: Machine Learning in Computational Biology »
Gal Chechik · Christina Leslie · Quaid Morris · William S Noble · Gunnar Rätsch -
2008 Mini Symposium: Machine Learning in Computational Biology »
Gal Chechik · Christina Leslie · Quaid Morris · William S Noble · Gunnar Rätsch -
2008 Poster: An empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis »
Gabriele B Schweikert · Christian Widmer · Bernhard Schölkopf · Gunnar Rätsch -
2007 Workshop: Machine Learning in Computational Biology (Part 2) »
Gal Chechik · Christina Leslie · Quaid Morris · William S Noble · Gunnar Rätsch · Koji Tsuda -
2007 Workshop: Machine Learning in Computational Biology (Part 1) »
Gal Chechik · Christina Leslie · Quaid Morris · William S Noble · Gunnar Rätsch · Koji Tsuda -
2007 Spotlight: Boosting Algorithms for Maximizing the Soft Margin »
Manfred K. Warmuth · Karen Glocer · Gunnar Rätsch -
2007 Poster: Boosting Algorithms for Maximizing the Soft Margin »
Manfred K. Warmuth · Karen Glocer · Gunnar Rätsch -
2006 Workshop: New Problems and Methods in Computational Biology »
Gal Chechik · Quaid Morris · Koji Tsuda · Gunnar Rätsch · Christina Leslie · William S Noble -
2006 Poster: Large Scale Hidden Semi-Markov SVMs »
Gunnar Rätsch · Soeren Sonnenburg -
2006 Demonstration: SHOGUN Machine Learning Toolbox »
Soeren Sonnenburg · Gunnar Rätsch