Timezone: »
We analyze skip-gram with negative-sampling (SGNS), a word embedding method introduced by Mikolov et al., and show that it is implicitly factorizing a word-context matrix, whose cells are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant. We find that another embedding method, NCE, is implicitly factorizing a similar matrix, where each cell is the (shifted) log conditional probability of a word given its context. We show that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks. When dense low-dimensional vectors are preferred, exact factorization with SVD can achieve solutions that are at least as good as SGNS's solutions for word similarity tasks. On analogy questions SGNS remains superior to SVD. We conjecture that this stems from the weighted nature of SGNS's factorization.
Author Information
Omer Levy (FAIR, Meta)
Yoav Goldberg (Bar-Ilan University)
More from the Same Authors
-
2021 : CommonsenseQA 2.0: Exposing the Limits of AI through Gamification »
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant -
2023 Poster: LIMA: Less Is More for Alignment »
Chunting Zhou · Pengfei Liu · Puxin Xu · Srinivasan Iyer · Jiao Sun · Yuning Mao · Xuezhe Ma · Avia Efrat · Ping Yu · LILI YU · Susan Zhang · Gargi Ghosh · Mike Lewis · Luke Zettlemoyer · Omer Levy -
2023 Poster: Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation »
Yuval Kirstain · Adam Polyak · Uriel Singer · Shahbuland Matiana · Joe Penna · Omer Levy -
2023 Poster: Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment »
Royi Rassin · Eran Hirsch · Daniel Glickman · Shauli Ravfogel · Yoav Goldberg · Gal Chechik -
2023 Oral: Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment »
Royi Rassin · Eran Hirsch · Daniel Glickman · Shauli Ravfogel · Yoav Goldberg · Gal Chechik -
2021 : CommonsenseQA 2.0: Exposing the Limits of AI through Gamification »
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant -
2019 Poster: Are Sixteen Heads Really Better than One? »
Paul Michel · Omer Levy · Graham Neubig -
2019 Poster: SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems »
Alex Wang · Yada Pruksachatkun · Nikita Nangia · Amanpreet Singh · Julian Michael · Felix Hill · Omer Levy · Samuel Bowman -
2019 Spotlight: SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems »
Alex Wang · Yada Pruksachatkun · Nikita Nangia · Amanpreet Singh · Julian Michael · Felix Hill · Omer Levy · Samuel Bowman -
2019 Poster: A Little Is Enough: Circumventing Defenses For Distributed Learning »
Moran Baruch · Gilad Baruch · Yoav Goldberg -
2017 Poster: On-the-fly Operation Batching in Dynamic Computation Graphs »
Graham Neubig · Yoav Goldberg · Chris Dyer