Timezone: »

 
Poster
Minimax Localization of Structural Information in Large Noisy Matrices
Mladen Kolar · Sivaraman Balakrishnan · Alessandro Rinaldo · Aarti Singh

Wed Dec 14 08:45 AM -- 02:59 PM (PST) @

We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures, etc is commonly referred to as biclustering or co-clustering. Despite its great practical relevance, and although several ad-hoc methods are available for biclustering, theoretical analysis of the problem is largely non-existent. The problem we consider is also closely related to structured multiple hypothesis testing, an area of statistics that has recently witnessed a flurry of activity. We make the following contributions: i) We prove lower bounds on the minimum signal strength needed for successful recovery of a bicluster as a function of the noise variance, size of the matrix and bicluster of interest. ii) We show that a combinatorial procedure based on the scan statistic achieves this optimal limit. iii) We characterize the SNR required by several computationally tractable procedures for biclustering including element-wise thresholding, column/row average thresholding and a convex relaxation approach to sparse singular vector decomposition.

Author Information

Mladen Kolar (U Chicago)
Sivaraman Balakrishnan (CMU)
Alessandro Rinaldo (CMU)
Aarti Singh (CMU)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors