Poster
Embedding Dimension of Contrastive Learning and -Nearest Neighbors
Dmitrii Avdiukhin · Vaggos Chatziafratis · Orr Fischer · Grigory Yaroslavtsev
East Exhibit Hall A-C #2103
Abstract:
We study the embedding dimension of distance comparison data in two settings: contrastive learning and -nearest neighbors (-NN). In both cases, the goal is to find the smallest dimension of an -space in which a given dataset can be represented. We show that the arboricity of the associated graphs plays a key role in designing embeddings. Using this approach, for the most frequently used -distance, we get matching upper and lower bounds in both settings. In contrastive learning, we are given labeled samples of the form representing the fact that the positive example is closer to the anchor than the negative example . We show that for representing such dataset in:- : is necessary and sufficient.- for : is sufficient and is necessary.- : is sufficient and is necessary.We also give results for the more general scenario when negatives are allowed.In -NN, for each of the data points we are given an ordered set of the closest points. We show that for preserving the ordering of the -NN for every point in:- : is necessary and sufficient.- for : is sufficient and is necessary.- : is necessary.Furthermore, if the goal is to not just preserve the ordering of the -NN but also keep them as the nearest neighbors then suffices in for .
Chat is not available.