Timezone: »
Unsupervised text embedding has shown great power in a wide range of NLP tasks. While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding. To close this gap, we propose a spherical generative model based on which unsupervised word and paragraph embeddings are jointly learned. To learn text embeddings in the spherical space, we develop an efficient optimization algorithm with convergence guarantee based on Riemannian optimization. Our model enjoys high efficiency and achieves state-of-the-art performances on various text embedding tasks including word similarity and document clustering.
Author Information
Yu Meng (University of Illinois at Urbana-Champaign)
Jiaxin Huang (University of Illinois Urbana-Champaign)
Guangyuan Wang (UIUC)
Chao Zhang (Georgia Institute of Technology)
Honglei Zhuang (Google Research)
Lance Kaplan (U.S. Army Research Laboratory)
Jiawei Han (UIUC)
More from the Same Authors
-
2022 : Shift-Robust Node Classification via Graph Clustering Co-training »
Qi Zhu · Chao Zhang · Chanyoung Park · Carl Yang · Jiawei Han -
2022 Poster: UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification »
Yanbo Xu · Alind Khare · Glenn Matlin · Monish Ramadoss · Rishikesan Kamaleswaran · Chao Zhang · Alexey Tumanov -
2022 Poster: End-to-end Stochastic Optimization with Energy-based Model »
Lingkai Kong · Jiaming Cui · Yuchen Zhuang · Rui Feng · B. Aditya Prakash · Chao Zhang -
2022 Poster: Generating Training Data with Language Models: Towards Zero-Shot Language Understanding »
Yu Meng · Jiaxin Huang · Yu Zhang · Jiawei Han -
2021 Poster: When in Doubt: Neural Non-Parametric Uncertainty Quantification for Epidemic Forecasting »
Harshavardhan Kamarthi · Lingkai Kong · Alexander Rodriguez · Chao Zhang · B. Aditya Prakash -
2021 Poster: Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization »
Qi Zhu · Carl Yang · Yidan Xu · Haonan Wang · Chao Zhang · Jiawei Han -
2021 Poster: COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining »
Yu Meng · Chenyan Xiong · Payal Bajaj · saurabh tiwary · Paul Bennett · Jiawei Han · XIA SONG -
2018 Poster: Evidential Deep Learning to Quantify Classification Uncertainty »
Murat Sensoy · Lance Kaplan · Melih Kandemir