Timezone: »
We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures, etc is commonly referred to as biclustering or co-clustering. Despite its great practical relevance, and although several ad-hoc methods are available for biclustering, theoretical analysis of the problem is largely non-existent. The problem we consider is also closely related to structured multiple hypothesis testing, an area of statistics that has recently witnessed a flurry of activity. We make the following contributions: i) We prove lower bounds on the minimum signal strength needed for successful recovery of a bicluster as a function of the noise variance, size of the matrix and bicluster of interest. ii) We show that a combinatorial procedure based on the scan statistic achieves this optimal limit. iii) We characterize the SNR required by several computationally tractable procedures for biclustering including element-wise thresholding, column/row average thresholding and a convex relaxation approach to sparse singular vector decomposition.
Author Information
Mladen Kolar (U Chicago)
Sivaraman Balakrishnan (CMU)
Alessandro Rinaldo (CMU)
Aarti Singh (CMU)
Related Events (a corresponding poster, oral, or spotlight)
-
2011 Spotlight: Minimax Localization of Structural Information in Large Noisy Matrices »
Wed. Dec 14th 09:40 -- 09:44 AM Room
More from the Same Authors
-
2022 : Adaptive Inexact Sequential Quadratic Programming via Iterative Randomized Sketching »
Ilgee Hong · Sen Na · Mladen Kolar -
2022 Spotlight: Lightning Talks 1B-4 »
Andrei Atanov · Shiqi Yang · Wanshan Li · Yongchang Hao · Ziquan Liu · Jiaxin Shi · Anton Plaksin · Jiaxiang Chen · Ziqi Pan · yaxing wang · Yuxin Liu · Stepan Martyanov · Alessandro Rinaldo · Yuhao Zhou · Li Niu · Qingyuan Yang · Andrei Filatov · Yi Xu · Liqing Zhang · Lili Mou · Ruomin Huang · Teresa Yeo · kai wang · Daren Wang · Jessica Hwang · Yuanhong Xu · Qi Qian · Hu Ding · Michalis Titsias · Shangling Jui · Ajay Sohmshetty · Lester Mackey · Joost van de Weijer · Hao Li · Amir Zamir · Xiangyang Ji · Antoni Chan · Rong Jin -
2022 Spotlight: Detecting Abrupt Changes in Sequential Pairwise Comparison Data »
Wanshan Li · Alessandro Rinaldo · Daren Wang -
2022 Poster: Detecting Abrupt Changes in Sequential Pairwise Comparison Data »
Wanshan Li · Alessandro Rinaldo · Daren Wang -
2022 Poster: A Nonconvex Framework for Structured Dynamic Covariance Recovery »
Katherine Tsai · Mladen Kolar · Sanmi Koyejo -
2021 Poster: Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels »
Stefani Karp · Ezra Winston · Yuanzhi Li · Aarti Singh -
2021 Poster: Lattice partition recovery with dyadic CART »
OSCAR HERNAN MADRID PADILLA · Yi Yu · Alessandro Rinaldo -
2020 Poster: A Unified View of Label Shift Estimation »
Saurabh Garg · Yifan Wu · Sivaraman Balakrishnan · Zachary Lipton -
2020 Poster: Preference-based Reinforcement Learning with Finite-Time Guarantees »
Yichong Xu · Ruosong Wang · Lin Yang · Aarti Singh · Artur Dubrawski -
2020 Spotlight: Preference-based Reinforcement Learning with Finite-Time Guarantees »
Yichong Xu · Ruosong Wang · Lin Yang · Aarti Singh · Artur Dubrawski -
2019 Poster: Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection »
Xiaoyi Gu · Leman Akoglu · Alessandro Rinaldo -
2019 Poster: On Testing for Biases in Peer Review »
Ivan Stelmakh · Nihar Shah · Aarti Singh -
2019 Spotlight: On Testing for Biases in Peer Review »
Ivan Stelmakh · Nihar Shah · Aarti Singh -
2019 Poster: Are sample means in multi-armed bandits positively or negatively biased? »
Jaehyeok Shin · Aaditya Ramdas · Alessandro Rinaldo -
2019 Spotlight: Are sample means in multi-armed bandits positively or negatively biased? »
Jaehyeok Shin · Aaditya Ramdas · Alessandro Rinaldo -
2018 Poster: How Many Samples are Needed to Estimate a Convolutional Neural Network? »
Simon Du · Yining Wang · Xiyu Zhai · Sivaraman Balakrishnan · Russ Salakhutdinov · Aarti Singh -
2018 Poster: Optimization of Smooth Functions with Noisy Observations: Local Minimax Rates »
Yining Wang · Sivaraman Balakrishnan · Aarti Singh -
2017 : Persistent homology of KDE filtration of Rips complexes »
Jaehyeok Shin · Alessandro Rinaldo -
2017 Workshop: Advances in Modeling and Learning Interactions from Complex Data »
Gautam Dasarathy · Mladen Kolar · Richard Baraniuk -
2017 Poster: Hypothesis Transfer Learning via Transformation Functions »
Simon Du · Jayanth Koushik · Aarti Singh · Barnabas Poczos -
2017 Poster: A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening »
Kevin Lin · James Sharpnack · Alessandro Rinaldo · Ryan Tibshirani -
2017 Poster: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Poster: The Expxorcist: Nonparametric Graphical Models Via Conditional Exponential Densities »
Arun Suggala · Mladen Kolar · Pradeep Ravikumar -
2017 Spotlight: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Poster: On the Power of Truncated SVD for General High-rank Matrix Estimation Problems »
Simon Du · Yining Wang · Aarti Singh -
2017 Poster: Noise-Tolerant Interactive Learning Using Pairwise Comparisons »
Yichong Xu · Hongyang Zhang · Aarti Singh · Artur Dubrawski · Kyle Miller -
2016 : Mladen Kolar. Post-Regularization Inference for Dynamic Nonparanormal Graphical Models. »
Mladen Kolar -
2016 Poster: Data Poisoning Attacks on Factorization-Based Collaborative Filtering »
Bo Li · Yining Wang · Aarti Singh · Yevgeniy Vorobeychik -
2016 Poster: Statistical Inference for Cluster Trees »
Jisu KIM · Yen-Chi Chen · Sivaraman Balakrishnan · Alessandro Rinaldo · Larry Wasserman -
2016 Poster: Statistical Inference for Pairwise Graphical Models Using Score Matching »
Ming Yu · Mladen Kolar · Varun Gupta -
2016 Poster: Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences »
Chi Jin · Yuchen Zhang · Sivaraman Balakrishnan · Martin J Wainwright · Michael Jordan -
2015 : Tsybakov Noise Adaptive Margin-Based Active Learning »
Aarti Singh -
2015 Poster: Differentially private subspace clustering »
Yining Wang · Yu-Xiang Wang · Aarti Singh -
2014 Workshop: Modern Nonparametrics 3: Automating the Learning Pipeline »
Eric Xing · Mladen Kolar · Arthur Gretton · Samory Kpotufe · Han Liu · Zoltán Szabó · Alan Yuille · Andrew G Wilson · Ryan Tibshirani · Sasha Rakhlin · Damian Kozbur · Bharath Sriperumbudur · David Lopez-Paz · Kirthevasan Kandasamy · Francesco Orabona · Andreas Damianou · Wacha Bounliphone · Yanshuai Cao · Arijit Das · Yingzhen Yang · Giulia DeSalvo · Dmitry Storcheus · Roberto Valerio -
2013 Workshop: Modern Nonparametric Methods in Machine Learning »
Arthur Gretton · Mladen Kolar · Samory Kpotufe · John Lafferty · Han Liu · Bernhard Schölkopf · Alexander Smola · Rob Nowak · Mikhail Belkin · Lorenzo Rosasco · peter bickel · Yue Zhao -
2013 Poster: Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic »
James L Sharpnack · Akshay Krishnamurthy · Aarti Singh -
2013 Poster: Low-Rank Matrix and Tensor Completion via Adaptive Sampling »
Akshay Krishnamurthy · Aarti Singh -
2013 Poster: Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation »
Martin Azizyan · Aarti Singh · Larry Wasserman -
2013 Poster: Cluster Trees on Manifolds »
Sivaraman Balakrishnan · Srivatsan Narayanan · Alessandro Rinaldo · Aarti Singh · Larry Wasserman -
2012 Workshop: Algebraic Topology and Machine Learning »
Sivaraman Balakrishnan · Alessandro Rinaldo · Donald Sheehy · Aarti Singh · Larry Wasserman -
2012 Workshop: Modern Nonparametric Methods in Machine Learning »
Sivaraman Balakrishnan · Arthur Gretton · Mladen Kolar · John Lafferty · Han Liu · Tong Zhang -
2012 Poster: Optimal kernel choice for large-scale two-sample tests »
Arthur Gretton · Bharath Sriperumbudur · Dino Sejdinovic · Heiko Strathmann · Sivaraman Balakrishnan · Massimiliano Pontil · Kenji Fukumizu -
2011 Poster: Noise Thresholds for Spectral Clustering »
Sivaraman Balakrishnan · Min Xu · Akshay Krishnamurthy · Aarti Singh -
2011 Spotlight: Noise Thresholds for Spectral Clustering »
Sivaraman Balakrishnan · Min Xu · Akshay Krishnamurthy · Aarti Singh -
2010 Oral: Identifying graph-structured activation patterns in networks »
James L Sharpnack · Aarti Singh -
2010 Poster: Identifying graph-structured activation patterns in networks »
James L Sharpnack · Aarti Singh -
2009 Poster: Time-Varying Dynamic Bayesian Networks »
Le Song · Mladen Kolar · Eric Xing -
2009 Spotlight: Time-Varying Dynamic Bayesian Networks »
Le Song · Mladen Kolar · Eric Xing -
2009 Poster: Sparsistent Learning of Varying-coefficient Models with Structural Changes »
Mladen Kolar · Le Song · Eric Xing -
2009 Spotlight: Sparsistent Learning of Varying-coefficient Models with Structural Changes »
Mladen Kolar · Le Song · Eric Xing -
2008 Poster: Unlabeled data: Now it helps, now it doesn't »
Aarti Singh · Rob Nowak · Jerry Zhu -
2008 Oral: Unlabeled data: Now it helps, now it doesn't »
Aarti Singh · Rob Nowak · Jerry Zhu