`

Timezone: »

 
Poster
Analysis of Learning from Positive and Unlabeled Data
Marthinus C du Plessis · Gang Niu · Masashi Sugiyama

Wed Dec 10 04:00 PM -- 08:59 PM (PST) @ Level 2, room 210D #None
Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. We then show that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundary due to an intrinsic bias, but the problem can be avoided by using non-convex loss functions such as the ramp loss. We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset). Finally, we provide generalization error bounds and show that, for an equal number of labeled and unlabeled samples, the generalization error of learning only from positive and unlabeled samples is no worse than $2\sqrt{2}$ times the fully supervised case. These theoretical findings are also validated through experiments.

Author Information

Marthinus C du Plessis (Tokyo Institute of Technology)
Gang Niu (RIKEN)

Gang Niu is currently a research scientist (indefinite-term) at RIKEN Center for Advanced Intelligence Project. He received the PhD degree in computer science from Tokyo Institute of Technology in 2013. Before joining RIKEN as a research scientist, he was a senior software engineer at Baidu and then an assistant professor at the University of Tokyo. He has published more than 70 journal articles and conference papers, including 14 NeurIPS (1 oral and 3 spotlights), 28 ICML, and 2 ICLR (1 oral) papers. He has served as an area chair 14 times, including ICML 2019--2021, NeurIPS 2019--2021, and ICLR 2021--2022.

Masashi Sugiyama (RIKEN / University of Tokyo)

More from the Same Authors