Timezone: »
Accurately measuring the similarity between text documents lies at the core of many real world applications of machine learning. These include web-search ranking, document recommendation, multi-lingual document matching, and article categorization. Recently, a new document metric, the word mover's distance (WMD), has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high quality word embeddings to document metrics by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised WMD (S-WMD) metric. Our algorithm learns document distances that measure the underlying semantic differences between documents by leveraging semantic differences between individual words discovered during supervised training. This is achieved with an linear transformation of the underlying word embedding space and tailored word-specific weights, learned to minimize the stochastic leave-one-out nearest neighbor classification error on a per-document level. We evaluate our metric on eight real-world text classification tasks on which S-WMD consistently outperforms almost all of our 26 competitive baselines.
Author Information
Gao Huang (Cornell University)
Chuan Guo (Cornell University)
Matt J Kusner (Washington University in St. Louis)
Yu Sun (Cornell University)
Fei Sha (University of Southern California)
Kilian Weinberger (Cornell University / ASAPP Research)
More from the Same Authors
-
2021 : Fixed Neural Network Steganography: Train the images, not the network »
Varsha Kishore · Xiangyu Chen · Yan Wang · Boyi Li · Kilian Weinberger -
2021 Poster: Online Adaptation to Label Distribution Shift »
Ruihan Wu · Chuan Guo · Yi Su · Kilian Weinberger -
2021 Poster: Fixes That Fail: Self-Defeating Improvements in Machine-Learning Systems »
Ruihan Wu · Chuan Guo · Awni Hannun · Laurens van der Maaten -
2021 Poster: ReAct: Out-of-distribution Detection With Rectified Activations »
Yiyou Sun · Chuan Guo · Yixuan Li -
2021 Poster: BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining »
Weizhe Hua · Yichi Zhang · Chuan Guo · Zhiru Zhang · G. Edward Suh -
2020 : Panel »
Kilian Weinberger · Maria De-Arteaga · Shibani Santurkar · Jonathan Frankle · Deborah Raji -
2020 : Q&A with Kilian »
Kilian Weinberger -
2020 : Invited: Kilian Weinberger »
Kilian Weinberger -
2020 Poster: Identifying Mislabeled Data using the Area Under the Margin Ranking »
Geoff Pleiss · Tianyi Zhang · Ethan Elenberg · Kilian Weinberger -
2020 Poster: Wasserstein Distances for Stereo Disparity Estimation »
Divyansh Garg · Yan Wang · Bharath Hariharan · Mark Campbell · Kilian Weinberger · Wei-Lun Chao -
2020 Spotlight: Wasserstein Distances for Stereo Disparity Estimation »
Divyansh Garg · Yan Wang · Bharath Hariharan · Mark Campbell · Kilian Weinberger · Wei-Lun Chao -
2019 Poster: Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces »
Chuan Guo · Ali Mousavi · Xiang Wu · Daniel Holtmann-Rice · Satyen Kale · Sashank Reddi · Sanjiv Kumar -
2019 Poster: Positional Normalization »
Boyi Li · Felix Wu · Kilian Weinberger · Serge Belongie -
2019 Spotlight: Positional Normalization »
Boyi Li · Felix Wu · Kilian Weinberger · Serge Belongie -
2019 Poster: Exact Gaussian Processes on a Million Data Points »
Ke Alexander Wang · Geoff Pleiss · Jacob Gardner · Stephen Tyree · Kilian Weinberger · Andrew Gordon Wilson -
2019 Poster: A New Defense Against Adversarial Images: Turning a Weakness into a Strength »
Shengyuan Hu · Tao Yu · Chuan Guo · Wei-Lun Chao · Kilian Weinberger -
2018 Poster: GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration »
Jacob Gardner · Geoff Pleiss · Kilian Weinberger · David Bindel · Andrew Wilson -
2018 Spotlight: GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration »
Jacob Gardner · Geoff Pleiss · Kilian Weinberger · David Bindel · Andrew Wilson -
2017 Poster: On Fairness and Calibration »
Geoff Pleiss · Manish Raghavan · Felix Wu · Jon Kleinberg · Kilian Weinberger -
2016 Poster: Supervised Word Mover's Distance »
Gao Huang · Chuan Guo · Matt J Kusner · Yu Sun · Fei Sha · Kilian Weinberger -
2015 : Deep Manifold Traversal »
Kilian Weinberger -
2015 Poster: Fast Distributed k-Center Clustering with Outliers on Massive Data »
Gustavo Malkomes · Matt J Kusner · Wenlin Chen · Kilian Q Weinberger · Benjamin Moseley -
2015 Poster: Bayesian Active Model Selection with an Application to Automated Audiometry »
Jacob Gardner · Gustavo Malkomes · Roman Garnett · Kilian Weinberger · Dennis Barbour · John Cunningham -
2014 Workshop: Representation and Learning Methods for Complex Outputs »
Richard Zemel · Dale Schuurmans · Kilian Q Weinberger · Yuhong Guo · Jia Deng · Francesco Dinuzzo · Hal Daumé III · Honglak Lee · Noah A Smith · Richard Sutton · Jiaqian YU · Vitaly Kuznetsov · Luke Vilnis · Hanchen Xiong · Calvin Murdock · Thomas Unterthiner · Jean-Francis Roy · Martin Renqiang Min · Hichem SAHBI · Fabio Massimo Zanzotto -
2013 Workshop: Output Representation Learning »
Yuhong Guo · Dale Schuurmans · Richard Zemel · Samy Bengio · Yoshua Bengio · Li Deng · Dan Roth · Kilian Q Weinberger · Jason Weston · Kihyuk Sohn · Florent Perronnin · Gabriel Synnaeve · Pablo R Strasser · julien audiffren · Carlo Ciliberto · Dan Goldwasser -
2012 Poster: Non-linear Metric Learning »
Dor Kedem · Stephen Tyree · Kilian Q Weinberger · Fei Sha · Gert Lanckriet -
2011 Workshop: Beyond Mahalanobis: Supervised Large-Scale Learning of Similarity »
Greg Shakhnarovich · Dhruv Batra · Brian Kulis · Kilian Q Weinberger -
2011 Poster: Co-Training for Domain Adaptation »
Minmin Chen · Kilian Q Weinberger · John Blitzer -
2010 Session: Oral Session 16 »
Kilian Q Weinberger -
2010 Poster: Large Margin Multi-Task Metric Learning »
Shibin Parameswaran · Kilian Q Weinberger -
2010 Poster: Decoding Ipsilateral Finger Movements from ECoG Signals in Humans »
Yuzong Liu · Mohit Sharma · Charles M Gaona · Jonathan D Breshears · jarod Roland · zachary V Freudenburg · Kilian Q Weinberger · Eric C Leuthardt -
2008 Poster: Large Margin Taxonomy Embedding for Document Categorization »
Kilian Q Weinberger · Olivier Chapelle -
2008 Spotlight: Large Margin Taxonomy Embedding for Document Categorization »
Kilian Q Weinberger · Olivier Chapelle -
2006 Workshop: Novel Applications of Dimensionality Reduction »
John Blitzer · Rajarshi Das · Irina Rish · Kilian Q Weinberger -
2006 Poster: Graph Regularization for Maximum Variance Unfolding with an Application to Sensor Localization »
Kilian Q Weinberger · Fei Sha · Qihui Zhu · Lawrence Saul