Skip to yearly menu bar Skip to main content


Poster

Embedding Dimension of Contrastive Learning and k-Nearest Neighbors

Dmitrii Avdiukhin · Vaggos Chatziafratis · Orr Fischer · Grigory Yaroslavtsev

East Exhibit Hall A-C #2103
[ ]
[ Paper [ Poster [ OpenReview
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract: We study the embedding dimension of distance comparison data in two settings: contrastive learning and k-nearest neighbors (k-NN). In both cases, the goal is to find the smallest dimension d of an p-space in which a given dataset can be represented. We show that the arboricity of the associated graphs plays a key role in designing embeddings. Using this approach, for the most frequently used 2-distance, we get matching upper and lower bounds in both settings. In contrastive learning, we are given m labeled samples of the form (xi,yi+,zi) representing the fact that the positive example yi is closer to the anchor xi than the negative example zi. We show that for representing such dataset in:- 2: d=Θ(m) is necessary and sufficient.- p for p1: d=O(m) is sufficient and d=Ω~(m) is necessary.- : d=O(m2/3) is sufficient and d=Ω~(m) is necessary.We also give results for the more general scenario when t negatives are allowed.In k-NN, for each of the n data points we are given an ordered set of the closest k points. We show that for preserving the ordering of the k-NN for every point in:- 2: d=Θ(k) is necessary and sufficient.- p for p1: d=O~(k2) is sufficient and d=Ω~(k) is necessary.- : d=Ω~(k) is necessary.Furthermore, if the goal is to not just preserve the ordering of the k-NN but also keep them as the nearest neighbors then d=O~(poly(k)) suffices in p for p1.

Chat is not available.