Skip to yearly menu bar Skip to main content


Poster

Adaptive Sampling for Efficient Softmax Approximation

Tavor Baharav · Ryan Kang · Colin Sullivan · Mo Tiwari · Eric Luxenberg · David Tse · Mert Pilanci

[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: The softmax function is ubiquitous in machine learning and optimization applications. Computing the full softmax function of a matrix vector product can be computationally expensive in high-dimensional settings. In many applications, however, it is sufficient to calculate only the top few outputs of the softmax function. In this work we present present an algorithm, dubbed AdaptiveSoftmax, that adaptively computes the top $k$ softmax values more efficiently than the full softmax computation. We provide PAC guarantees for AdaptiveSoftmax and demonstrate that it is more sample-efficient than the full softmax computation. We demonstrate the effectiveness of AdaptiveSoftmax on real and synthetic data to corroborate our theoretical results. On some datasets, such as the EuroSAT dataset, AdaptiveSoftmax has several hundred times lower sample complexity than the full softmax computation, with gains often on the order of 10x. The adaptive method we propose for estimating the partition function (the softmax denominator) is of independent interest, and can be used in other applications such as kernel density estimation.

Live content is unavailable. Log in and register to view live content