Timezone: »
This paper presents a parallel feature selection method for classification that scales up to very high dimensions and large data sizes. Our original method is inspired by group testing theory, under which the feature selection procedure consists of a collection of randomized tests to be performed in parallel. Each test corresponds to a subset of features, for which a scoring function may be applied to measure the relevance of the features in a classification task. We develop a general theory providing sufficient conditions under which true features are guaranteed to be correctly identified. Superior performance of our method is demonstrated on a challenging relation extraction task from a very large data set that have both redundant features and sample size in the order of millions. We present comprehensive comparisons with state-of-the-art feature selection methods on a range of data sets, for which our method exhibits competitive performance in terms of running time and accuracy. Moreover, it also yields substantial speedup when used as a pre-processing step for most other existing methods.
Author Information
Yingbo Zhou (State University of New York at Buffalo)
Utkarsh Porwal
Ce Zhang (Wisconsin)
Hung Q Ngo (University at Buffalo, SUNY)
XuanLong Nguyen (University of Michigan)
Christopher Ré (Stanford)
Venu Govindaraju (SUNY Buffalo)
More from the Same Authors
-
2021 : Personalized Benchmarking with the Ludwig Benchmarking Toolkit »
Avanika Narayan · Piero Molino · Karan Goel · Willie Neiswanger · Christopher Ré -
2021 : SKM-TEA: A Dataset for Accelerated MRI Reconstruction with Dense Image Labels for Quantitative Clinical Evaluation »
Arjun Desai · Andrew Schmidt · Elka Rubin · Christopher Sandino · Marianne Black · Valentina Mazzoli · Kathryn Stevens · Robert Boutin · Christopher Ré · Garry Gold · Brian Hargreaves · Akshay Chaudhari -
2021 : Combining Recurrent, Convolutional, and Continuous-Time Models with Structured Learnable Linear State-Space Layers »
Isys Johnson · Albert Gu · Karan Goel · Khaled Saab · Tri Dao · Atri Rudra · Christopher Ré -
2022 Spotlight: Machine Learning on Graphs: A Model and Comprehensive Taxonomy »
Ines Chami · Sami Abu-El-Haija · Bryan Perozzi · Christopher Ré · Kevin Murphy -
2022 Poster: On the Parameterization and Initialization of Diagonal State Space Models »
Albert Gu · Karan Goel · Ankit Gupta · Christopher Ré -
2022 Poster: Self-Supervised Learning of Brain Dynamics from Broad Neuroimaging Data »
Armin Thomas · Christopher Ré · Russell Poldrack -
2022 Poster: HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions »
Lingjiao Chen · Zhihua Jin · Evan Sabri Eyuboglu · Christopher Ré · Matei Zaharia · James Zou -
2022 Poster: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness »
Tri Dao · Dan Fu · Stefano Ermon · Atri Rudra · Christopher Ré -
2022 Poster: Contrastive Adapters for Foundation Model Group Robustness »
Michael Zhang · Christopher Ré -
2022 Poster: Decentralized Training of Foundation Models in Heterogeneous Environments »
Binhang Yuan · Yongjun He · Jared Davis · Tianyi Zhang · Tri Dao · Beidi Chen · Percy Liang · Christopher Ré · Ce Zhang -
2022 Poster: Transform Once: Efficient Operator Learning in Frequency Domain »
Michael Poli · Stefano Massaroli · Federico Berto · Jinkyoo Park · Tri Dao · Christopher Ré · Stefano Ermon -
2022 Poster: Beyond black box densities: Parameter learning for the deviated components »
Dat Do · Nhat Ho · XuanLong Nguyen -
2022 Poster: Machine Learning on Graphs: A Model and Comprehensive Taxonomy »
Ines Chami · Sami Abu-El-Haija · Bryan Perozzi · Christopher Ré · Kevin Murphy -
2022 Poster: S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces »
Eric Nguyen · Karan Goel · Albert Gu · Gordon Downs · Preey Shah · Tri Dao · Stephen Baccus · Christopher Ré -
2022 Poster: Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees »
Jue WANG · Binhang Yuan · Luka Rimanic · Yongjun He · Tri Dao · Beidi Chen · Christopher Ré · Ce Zhang -
2021 Poster: Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers »
Albert Gu · Isys Johnson · Karan Goel · Khaled Saab · Tri Dao · Atri Rudra · Christopher Ré -
2021 Poster: Rethinking Neural Operations for Diverse Tasks »
Nicholas Roberts · Mikhail Khodak · Tri Dao · Liam Li · Christopher Ré · Ameet Talwalkar -
2020 Workshop: Differential Geometry meets Deep Learning (DiffGeo4DL) »
Joey Bose · Emile Mathieu · Charline Le Lan · Ines Chami · Frederic Sala · Christopher De Sa · Maximilian Nickel · Christopher Ré · Will Hamilton -
2020 Poster: HiPPO: Recurrent Memory with Optimal Polynomial Projections »
Albert Gu · Tri Dao · Stefano Ermon · Atri Rudra · Christopher Ré -
2020 Spotlight: HiPPO: Recurrent Memory with Optimal Polynomial Projections »
Albert Gu · Tri Dao · Stefano Ermon · Atri Rudra · Christopher Ré -
2020 Oral: Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent »
Benjamin Recht · Christopher Ré · Stephen Wright · Feng Niu -
2020 Poster: From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering »
Ines Chami · Albert Gu · Vaggos Chatziafratis · Christopher Ré -
2019 Workshop: KR2ML - Knowledge Representation and Reasoning Meets Machine Learning »
Veronika Thost · Christian Muise · Kartik Talamadupula · Sameer Singh · Christopher Ré -
2019 Poster: On the Downstream Performance of Compressed Word Embeddings »
Avner May · Jian Zhang · Tri Dao · Christopher Ré -
2019 Spotlight: On the Downstream Performance of Compressed Word Embeddings »
Avner May · Jian Zhang · Tri Dao · Christopher Ré -
2019 Poster: Multi-Resolution Weak Supervision for Sequential Data »
Paroma Varma · Frederic Sala · Shiori Sagawa · Jason A Fries · Dan Fu · Saelig Khattar · Ashwini Ramamoorthy · Ke Xiao · Kayvon Fatahalian · James Priest · Christopher Ré -
2019 Poster: Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices »
Vincent Chen · Sen Wu · Alexander Ratner · Jen Weng · Christopher Ré -
2019 Poster: Hyperbolic Graph Convolutional Neural Networks »
Ines Chami · Zhitao Ying · Christopher Ré · Jure Leskovec -
2019 Poster: Scalable inference of topic evolution via models for latent geometric structures »
Mikhail Yurochkin · Zhiwei Fan · Aritra Guha · Paraschos Koutris · XuanLong Nguyen -
2018 Workshop: Relational Representation Learning »
Aditya Grover · Paroma Varma · Frederic Sala · Christopher Ré · Jennifer Neville · Stefano Ermon · Steven Holtzen -
2018 Poster: Learning Compressed Transforms with Low Displacement Rank »
Anna Thomas · Albert Gu · Tri Dao · Atri Rudra · Christopher Ré -
2017 Workshop: Learning with Limited Labeled Data: Weak Supervision and Beyond »
Isabelle Augenstein · Stephen Bach · Eugene Belilovsky · Matthew Blaschko · Christoph Lampert · Edouard Oyallon · Emmanouil Antonios Platanios · Alexander Ratner · Christopher Ré -
2017 Workshop: ML Systems Workshop @ NIPS 2017 »
Aparna Lakshmiratan · Sarah Bird · Siddhartha Sen · Christopher Ré · Li Erran Li · Joseph Gonzalez · Daniel Crankshaw -
2017 Demonstration: Babble Labble: Learning from Natural Language Explanations »
Braden Hancock · Paroma Varma · Percy Liang · Christopher Ré · Stephanie Wang -
2017 Poster: Learning to Compose Domain-Specific Transformations for Data Augmentation »
Alexander Ratner · Henry Ehrenberg · Zeshan Hussain · Jared Dunnmon · Christopher Ré -
2017 Poster: Conic Scan-and-Cover algorithms for nonparametric topic modeling »
Mikhail Yurochkin · Aritra Guha · XuanLong Nguyen -
2017 Poster: Gaussian Quadrature for Kernel Features »
Tri Dao · Christopher M De Sa · Christopher Ré -
2017 Spotlight: Gaussian Quadrature for Kernel Features »
Tri Dao · Christopher M De Sa · Christopher Ré -
2017 Poster: Inferring Generative Model Structure with Static Analysis »
Paroma Varma · Bryan He · Payal Bajaj · Nishith Khandwala · Imon Banerjee · Daniel Rubin · Christopher Ré -
2017 Poster: Multi-way Interacting Regression via Factorization Machines »
Mikhail Yurochkin · XuanLong Nguyen · nikolaos Vasiloglou -
2016 : Invited Talk: You've been using asynchrony wrong your whole life! (Chris Re, Stanford) »
Christopher Ré -
2016 Poster: Cyclades: Conflict-free Asynchronous Machine Learning »
Xinghao Pan · Maximilian Lam · Stephen Tu · Dimitris Papailiopoulos · Ce Zhang · Michael Jordan · Kannan Ramchandran · Christopher Ré · Benjamin Recht -
2016 Poster: Sub-sampled Newton Methods with Non-uniform Sampling »
Peng Xu · Jiyan Yang · Farbod Roosta-Khorasani · Christopher Ré · Michael Mahoney -
2016 Poster: Geometric Dirichlet Means Algorithm for topic inference »
Mikhail Yurochkin · XuanLong Nguyen -
2015 Poster: Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care »
Sorathan Chaturapruek · John Duchi · Christopher Ré -
2015 Poster: Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width »
Christopher M De Sa · Ce Zhang · Kunle Olukotun · Christopher Ré -
2015 Spotlight: Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width »
Christopher M De Sa · Ce Zhang · Kunle Olukotun · Christopher Ré · Christopher Ré -
2015 Poster: Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms »
Christopher M De Sa · Ce Zhang · Kunle Olukotun · Christopher Ré · Christopher Ré -
2014 Workshop: 4th Workshop on Automated Knowledge Base Construction (AKBC) »
Sameer Singh · Fabian M Suchanek · Sebastian Riedel · Partha Pratim Talukdar · Kevin Murphy · Christopher Ré · William Cohen · Tom Mitchell · Andrew McCallum · Jason E Weston · Ramanathan Guha · Boyan Onyshkevych · Hoifung Poon · Oren Etzioni · Ari Kobren · Arvind Neelakantan · Peter Clark -
2014 Poster: Dimensionality Reduction with Subspace Structure Preservation »
Devansh Arpit · Ifeoma Nwogu · Venu Govindaraju -
2013 Workshop: Big Learning : Advances in Algorithms and Data Management »
Xinghao Pan · Haijie Gu · Joseph Gonzalez · Sameer Singh · Yucheng Low · Joseph Hellerstein · Derek G Murray · Raghu Ramakrishnan · Michael Jordan · Christopher Ré -
2013 Poster: Bayesian inference as iterated random functions with applications to sequential inference in graphical models »
Arash Amini · XuanLong Nguyen -
2013 Spotlight: Bayesian inference as iterated random functions with applications to sequential inference in graphical models »
Arash Amini · XuanLong Nguyen -
2013 Poster: An Approximate, Efficient LP Solver for LP Rounding »
Srikrishna Sridhar · Stephen Wright · Christopher Re · Ji Liu · Victor Bittorf · Ce Zhang -
2007 Spotlight: Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization »
XuanLong Nguyen · Martin J Wainwright · Michael Jordan -
2007 Poster: Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization »
XuanLong Nguyen · Martin J Wainwright · Michael Jordan -
2006 Poster: Distributed PCA and Network Anomaly Detection »
Ling Huang · XuanLong Nguyen · Minos Garofalakis · Michael Jordan · Anthony D Joseph · Nina Taft