Timezone: »

Multi-Resolution Weak Supervision for Sequential Data
Paroma Varma · Frederic Sala · Shiori Sagawa · Jason Fries · Dan Fu · Saelig Khattar · Ashwini Ramamoorthy · Ke Xiao · Kayvon Fatahalian · James Priest · Christopher Ré

Thu Dec 12 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #110

Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in weak supervision is estimating the unknown accuracies and correlations of these sources without using labeled data. Multi-resolution sources exacerbate this challenge due to complex correlations and sample complexity that scales in the length of the sequence. We propose Dugong, the first framework to model multi-resolution weak supervision sources with complex correlations to assign probabilistic labels to training data. Theoretically, we prove that Dugong, under mild conditions, can uniquely recover the unobserved accuracy and correlation parameters and use parameter sharing to improve sample complexity. Our method assigns clinician-validated labels to population-scale biomedical video repositories, helping outperform traditional supervision by 36.8 F1 points and addressing a key use case where machine learning has been severely limited by the lack of expert labeled data. On average, Dugong improves over traditional supervision by 16.0 F1 points and existing weak supervision approaches by 24.2 F1 points across several video and sensor classification tasks.

Author Information

Paroma Varma (Stanford University)
Frederic Sala (Stanford)
Shiori Sagawa (Stanford University)
Jason Fries (Stanford University)

I'm currently a research scientist in the Shah Lab at Stanford University. Previously I was a CS postdoc in Stanford's Mobilize Center, mentored by Chris Ré and Scott Delp. My recent research explores weakly supervised machine learning, where indirect and often noisy sources of domain knowledge are combined to train models. Obtaining large-scale, expert-labeled training data is a significant challenge in medicine, making it difficult to take advantage of state-of-the-art machine learning tools. Weakly supervised methods enable new mechanisms of sharing medical expertise and generating training sets from large-scale collections of unlabeled text, medical imaging, and sensor data.

Dan Fu (Stanford University)
Saelig Khattar (Stanford University)
Ashwini Ramamoorthy (Stanford University)
Ke Xiao (Stanford University)
Kayvon Fatahalian (Stanford)
James Priest (Stanford University)
Christopher Ré (Stanford)

More from the Same Authors