Skip to yearly menu bar Skip to main content


Poster

Learning semantic similarity in a continuous space

Michel Deudon

Room 210 #91

Keywords: [ Natural Language Processing ] [ Representation Learning ] [ Deep Autoencoders ] [ Recurrent Networks ] [ Semi-Supervised Learning ] [ Graphical Models ] [ Variational Inference ] [ Similarity and Distance Learning ]


Abstract:

We address the problem of learning semantic representation of questions to measure similarity between pairs as a continuous distance metric. Our work naturally extends Word Mover’s Distance (WMD) [1] by representing text documents as normal distributions instead of bags of embedded words. Our learned metric measures the dissimilarity between two questions as the minimum amount of distance the intent (hidden representation) of one question needs to "travel" to match the intent of another question. We first learn to repeat, reformulate questions to infer intents as normal distributions with a deep generative model [2] (variational auto encoder). Semantic similarity between pairs is then learned discriminatively as an optimal transport distance metric (Wasserstein 2) with our novel variational siamese framework. Among known models that can read sentences individually, our proposed framework achieves competitive results on Quora duplicate questions dataset. Our work sheds light on how deep generative models can approximate distributions (semantic representations) to effectively measure semantic similarity with meaningful distance metrics from Information Theory.

Live content is unavailable. Log in and register to view live content