Timezone: »
Contrastive analysis (CA) refers to the exploration of variations uniquely enriched in a target dataset as compared to a corresponding background dataset generated from sources of variation that are irrelevant to a given task. For example, a biomedical data analyst may wish to find a small set of genes to use as a proxy for variations in genomic data only present among patients with a given disease (target) as opposed to healthy control subjects (background). However, as of yet the problem of feature selection in the CA setting has received little attention from the machine learning community. In this work we present contrastive feature selection (CFS),a method for performing feature selection in the CA setting. We motivate our approach with a novel information-theoretic analysis of representation learning in the CA setting, and we empirically validate CFS on a semi-synthetic dataset and four real-world biomedical datasets. We find that our method consistently outperforms previously proposed state-of-the-art supervised and fully unsupervised feature selection methods not designed for the CA setting. An open-source implementation of our method is available at https://github.com/suinleelab/CFS.
Author Information
Ethan Weinberger (University of Washington)
Ian Covert (Stanford University)
Su-In Lee (University of Washington)
More from the Same Authors
-
2023 : A deep generative model of single-cell methylomic data »
Ethan Weinberger · Su-In Lee -
2023 Poster: On the Robustness of Removal-Based Feature Attributions »
Chris Lin · Ian Covert · Su-In Lee -
2020 Poster: Learning Deep Attribution Priors Based On Prior Knowledge »
Ethan Weinberger · Joseph Janizek · Su-In Lee -
2020 Poster: Understanding Global Feature Contributions With Additive Importance Measures »
Ian Covert · Scott Lundberg · Su-In Lee -
2017 Poster: A Unified Approach to Interpreting Model Predictions »
Scott M Lundberg · Su-In Lee -
2017 Oral: A unified approach to interpreting model predictions »
Scott M Lundberg · Su-In Lee -
2016 Poster: Learning Sparse Gaussian Graphical Models with Overlapping Blocks »
Mohammad Javad Hosseini · Su-In Lee -
2014 Workshop: Machine Learning in Computational Biology »
Oliver Stegle · Sara Mostafavi · Anna Goldenberg · Su-In Lee · Michael Leung · Anshul Kundaje · Mark B Gerstein · Martin Renqiang Min · Hannes Bretschneider · Francesco Paolo Casale · Loïc Schwaller · Amit G Deshwar · Benjamin A Logsdon · Yuanyang Zhang · Ali Punjani · Derek C Aguiar · Samuel Kaski