Timezone: »
We present a new learning strategy for classification problems in which train and/or test data suffer from missing features. In previous work, instances are represented as vectors from some feature space and one is forced to impute missing values or to consider an instance-specific subspace. In contrast, our method considers instances as sets of (feature,value) pairs which naturally handle the missing value case. Building onto this framework, we propose a classification strategy for sets. Our proposal maps (feature,value) pairs into an embedding space and then non-linearly combines the set of embedded vectors. The embedding and the combination parameters are learned jointly on the final classification objective. This simple strategy allows great flexibility in encoding prior knowledge about the features in the embedding step and yields advantageous results compared to alternative solutions over several datasets.
Author Information
David Grangier (NEC Labs America)
Iain Melvin (NEC Laboratories America)
Related Events (a corresponding poster, oral, or spotlight)
-
2010 Spotlight: Feature Set Embedding for Incomplete Data »
Thu. Dec 9th 01:50 -- 01:55 AM Room Regency Ballroom
More from the Same Authors
-
2010 Poster: Label Embedding Trees for Large Multi-Class Tasks »
Samy Bengio · Jason E Weston · David Grangier -
2009 Poster: Polynomial Semantic Indexing »
Bing Bai · Jason E Weston · David Grangier · Ronan Collobert · Kunihiko Sadamasa · Yanjun Qi · Corinna Cortes · Mehryar Mohri -
2006 Workshop: Learning to Compare Examples »
David Grangier · Samy Bengio