Timezone: »

Pass-efficient unsupervised feature selection
Crystal Maung · Haim Schweitzer

Thu Dec 05 07:00 PM -- 11:59 PM (PST) @ Harrah's Special Events Center, 2nd Floor #None

The goal of unsupervised feature selection is to identify a small number of important features that can represent the data. We propose a new algorithm, a modification of the classical pivoted QR algorithm of Businger and Golub, that requires a small number of passes over the data. The improvements are based on two ideas: keeping track of multiple features in each pass, and skipping calculations that can be shown not to affect the final selection. Our algorithm selects the exact same features as the classical pivoted QR algorithm, and has the same favorable numerical stability. We describe experiments on real-world datasets which sometimes show improvements of {\em several orders of magnitude} over the classical algorithm. These results appear to be competitive with recently proposed randomized algorithms in terms of pass efficiency and run time. On the other hand, the randomized algorithms may produce better features, at the cost of small probability of failure.

Author Information

Crystal Maung (UT Dallas)
Haim Schweitzer (UT Dallas)