Timezone: »
An important way to make large training sets is to gather noisy labels from crowds of nonexperts. We propose a minimax entropy principle to improve the quality of these labels. Our method assumes that labels are generated by a probability distribution over workers, items, and labels. By maximizing the entropy of this distribution, the method naturally infers item confusability and worker expertise. We infer the ground truth by minimizing the entropy of this distribution, which we show minimizes the Kullback-Leibler (KL) divergence between the probability distribution and the unknown truth. We show that a simple coordinate descent scheme can optimize minimax entropy. Empirically, our results are substantially better than previously published methods for the same problem.
Author Information
Denny Zhou (Microsoft Research Redmond)
John C Platt (Google)
Sumit Basu (Microsoft Research)
Yi Mao (Microsoft)
More from the Same Authors
-
2015 Poster: Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing »
Nihar Bhadresh Shah · Denny Zhou -
2014 Workshop: NIPS’14 Workshop on Crowdsourcing and Machine Learning »
David Parkes · Denny Zhou · Chien-Ju Ho · Nihar Bhadresh Shah · Adish Singla · Jared Heyman · Edwin Simpson · Andreas Krause · Rafael Frongillo · Jennifer Wortman Vaughan · Panagiotis Papadimitriou · Damien Peters -
2014 Poster: Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing »
Yuchen Zhang · Xi Chen · Denny Zhou · Michael Jordan -
2014 Spotlight: Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing »
Yuchen Zhang · Xi Chen · Denny Zhou · Michael Jordan -
2013 Workshop: Data Driven Education »
Jonathan Huang · Sumit Basu · Kalyan Veeramachaneni -
2013 Workshop: Crowdsourcing: Theory, Algorithms and Applications »
Jennifer Wortman Vaughan · Greg Stoddard · Chien-Ju Ho · Adish Singla · Michael Bernstein · Devavrat Shah · Arpita Ghosh · Evgeniy Gabrilovich · Denny Zhou · Nikhil Devanur · Xi Chen · Alexander Ihler · Qiang Liu · Genevieve Patterson · Ashwinkumar Badanidiyuru Varadaraja · Hossein Azari Soufiani · Jacob Whitehill -
2009 Workshop: Analysis and Design of Algorithms for Interactive Machine Learning »
Sumit Basu · Ashish Kapoor -
2007 Workshop: Machine Learning for Systems Problems (Part 2) »
Archana Ganapathi · Sumit Basu · Fei Sha · Emre Kiciman -
2007 Workshop: Machine Learning for Web Search »
Denny Zhou · Olivier Chapelle · Thorsten Joachims · Thomas Hofmann -
2007 Workshop: Machine Learning for Systems Problems (Part 1) »
Archana Ganapathi · Sumit Basu · Fei Sha · Emre Kiciman -
2007 Oral: Non-parametric Modeling of Partially Ranked Data »
Guy Lebanon · Yi Mao -
2007 Poster: Non-parametric Modeling of Partially Ranked Data »
Guy Lebanon · Yi Mao -
2007 Poster: Fast Variational Inference for Large-scale Internet Diagnosis »
John C Platt · Emre Kiciman · David A Maltz -
2006 Poster: Isotonic Conditional Random Fields and Local Sentiment Flow »
Yi Mao · Guy Lebanon